Unicode Technical Report #11
Unicode Character Property "East Asian Width"

Revision	2.0
Authors	Asmus Freytag, Mark Davis and Ken Whistler
Date	Dec 11, 1998
This Version	http://www.unicode.org/unicode/reports/tr11-2
Previous Version	http://www.unicode.org/unicode/reports/dtr11.html
Latest Version	http://www.unicode.org/unicode/reports/tr11

Summary

This report presents the specifications of a new property for Unicode characters.

Status of this document

This document has been considered and approved by the Unicode Technical Committee for publication as a Technical Report. At the current time, the specifications in this technical report are provided as information and guidance to implementers of the Unicode Standard, but do not form part of the standard itself. The Unicode Technical Committee may decide to incorporate all or part of the material of this technical report into a future version of the Unicode Standard, either as informative or as normative specification. Please mail corrigenda and other comments to errata@unicode.org.

East Asian Width Property

Overview

In mixed-width, East Asian, legacy encodings there is a concept of an inherent width of a character. For a fixed pitch font, this width translates to a display width of either one half or a whole unit width. A common name for this unit width is "Em". It is customarily the height of the letter 'M', but since in East Asian fonts the standard character cell is square, it is the same as the unit width.

NOTE: the character width for a fixed pitch Latin font like Courier is 3/5 of an em.

Layout and line breaking (to cite only two examples) in an East Asian context show systematic variations depending on the value of the East-Asian Width property (even for non-fixed pitch fonts). Further, the same information is useful in creating correct transcoding tables for East Asian character sets.

Scope

The East Asian Width property provides a useful concept for implementations that

have to interwork with East Asian legacy character encodings
support both East Asian and Western typography and line layout
need to associate fonts with unmarked text runs containing East Asian characters

This Unicode Technical Report does not provide rules or specifications of how this property might be used in font design or line layout, since, while a useful property for this purpose, it is only one of several character properties that would need to be considered.

Description

By convention, 1/2 Em wide characters of East Asian legacy encodings are called "half-width" (or hankaku characters in Japanese), the others are called correspondingly "full-width" (or zenkaku) characters. Legacy encodings often use a single byte for the half-width characters and two bytes for the full-width characters. In the Unicode Standard, no such distinction is made, but understanding the distinction is often necessary when interchanging data with legacy systems, especially when fixed size buffers are involved.

Some character blocks in the compatibility zone contain characters that are explicitly marked "half-width" and "full-width" in their character name but for all other characters the width property must be implicitly derived. Some characters behave differently in East Asian context than in non-East Asian content. Their default width property is considered ambiguous and needs to be resolved into an actual width property based on context.

This technical report assigns to each Unicode character one of the six values Ambiguous, Full Width, Half Width, Narrow, Wide, or Not East Asian Neutral (defined below) as its default width property. For any given operation, these six default properties resolve into only two property values narrow and wide, depending on context.

Definitions

All terms not defined here shall be as defined in the Unicode Standard.

East Asian Width - in the context of interoperating with East Asian legacy character encodings and implementing East Asian typography, character width is an abstract concept. It can take on two values, narrow and wide. The actual display width of a glyph is given by the font. An important class of fixed width legacy fonts contains glyphs of just two widths with the wider glyphs twice as wide as the narrower glyph.

East Asian Wide (W) - There are wide characters that are defined as full-width and also wide characters that are implicitly wide (such as the Unified Han Ideographs or Squared Katakana Symbols) because they occur only in the context of East Asian typography where they are wide characters.

East Asian FullWidth (FW) - East Asian Wide characters that are defined as full width and therefore are compatibility equivalents of implicitly narrow but unmarked characters elsewhere in the Unicode Standard. FW characters form a proper subset of W characters.

East Asian Narrow (N) - There are narrow characters that are defined as half-width and also characters that are half-width by implication because they have full-width clones (all of ASCII is an example).

East Asian Half-width (HW) - Narrow characters that are defined as half-width and therefore are compatibility characters of implicitly wide, but unmarked characters elsewhere in the Unicode Standard. HW characters form a proper subset of N characters.

Note: Because half-width punctuation behaves in some important ways like ideographic punctuation, it is useful to distinguish characters defined as half-width from characters that are narrow by implication. Since this information cannot be trivially derived from the block names, it is provided explicitly below.

East Asian Ambiguous (A) - Characters that occur in East Asian legacy character sets as wide characters, and as narrow characters in their own local or non-East Asian usage (Examples are the Greek and Cyrillic Alphabet found in East Asian character sets, but also some of the mathematical symbols). Ambiguous characters require context to resolve their width.

Note: Because East Asian legacy character sets do not always include complete case pairs of Latin characters, two members of a pair may have different EA Width properties:
Ambiguous: 	01D4    LATIN SMALL LETTER U WITH CARON
NEA Neutral:	01D3    LATIN CAPITAL LETTER U WITH CARON

Not East Asian (Neutral) - All characters that do not occur in legacy East Asian character sets. By extension, they also do not occur in East Asian typography. (There is no traditional Japanese way of typesetting Devanagari, for example). Narrow and Neutral characters are treated the same under the recommendations below, so their distinction is a matter of convenience.

diagram (informative)

Figure 1: Venn diagram showing the set relations for the five of the six categories.

Relation to "full-width" and "half-width"

When converting a DBCS mixed-width encoding to and from Unicode, the full-width characters in such a mixed-width encoding are mapped to the full-width compatibility characters in the FFxx block, whereas the corresponding half-width characters are mapped to ordinary Unicode characters (e.g. ASCII in U+0021..U+007E, plus a few other scattered characters).

In the context of interoperability with DBCS character encodings, that restricted set of Unicode characters in the General Scripts area can be construed as half-width, rather than full-width. (This applies only to the restricted set of characters which can be paired with the full-width compatibility characters.)

In the context of interoperability with DBCS character encodings, all other Unicode characters which are not explicitly marked as half-width can be construed as full-width.

In any other context, Unicode characters not explicitly marked as being either full-width or half-width compatibility forms should be construed as unmarked as to half-width versus full-width status.

Seen in this light, the "half-width" and "full-width" properties are not unitary character properties in the same sense as "space" or "combining" or "alphabetic". They are, instead, relational properties of a pair of characters, one of which is explicitly encoded as a half-width or full-width form for compatibility in mapping to DBCS mixed-width character encodings.

What is "full-width" by default today could in theory become "half-width" tomorrow by the introduction of another character on the SBCS part of a mixed-width code page somewhere, requiring the introduction of another full-width compatibility character to complete the mapping. Since the single byte part of mixed-width character sets is limited, there are not going to be many candidates and neither UTC and WG2 have any intention to add additional compatibility characters for this purpose.

Conformance

East Asian Width is an informative character property.

Recommendation (informative)

When interchanging data

Wide characters always map to full-width characters in the mixed-width set
Wide characters never map to non East Asian legacy character encodings
Narrow (and neutral) characters always map to half-width characters in the mixed-width set
Half-width characters always map to half-width characters in the mixed-width set
Ambiguous characters always map to full-width characters in East Asian legacy character encodings
Ambiguous characters always map to regular (narrow) characters in non-East Asian legacy character encodings

When processing or displaying data

Wide characters behave like ideographs in important ways. In fixed pitched fonts, they take up one Em of space.
Half-width characters behave like ideographs in some ways, In fixed pitched fonts, they take up 1/2 Em of space.
Narrow characters behave like Western characters in important ways, In fixed pitched East Asian fonts, they take up 1/2 Em of space.
Ambiguous characters behave like wide or narrow characters depending on context (language tag, associated font, source of data, or explicit markup all can provide the context)

Classifications (informative)

The classifications presented here are based on the most widely used mixed-width legacy character sets in use in East Asia as of this writing. In particular, the assignment of the neutral or ambiguous categories depend on the contents of these character sets. For example, an implementation that knows a-priori, that it only needs to interchange data with the Japanese Shift-JIS character set, but not other East Asian character sets, could reduce the number of characters in the ambiguous classification to those actually encoded in Shift-JIS. Or such a reduction could be done implicitly at runtime in the context of interoperating with Shift-JIS fonts or data sources. Conversely, if additional character sets are created and widely adopted for legacy purposes, more characters would need to be classified as ambiguous.

All characters not listed here are by default classified as non-East Asian neutral. .

East Asian Width classification of characters of the Unicode Standard, Version 2.1

The classifications are given in an annotated list where each line consists of either a character code XXXX or an inclusive character code range XXXX..YYYY followed by a comment delimiter # and the UTF-8 codes for XXXX and YYYY (these may or may not show correctly on your browser) and finally the Unicode character names for XXXX and YYYY. All information following the # sign may be ignored.

A - Ambiguous 
00A1		# ¡;	INVERTED EXCLAMATION MARK
00A4		# ¤;	CURRENCY SIGN
00A7..00A8	# §..¨;	SECTION SIGN..DIAERESIS
00AA		# ª;	FEMININE ORDINAL INDICATOR
00AD		# ;	SOFT HYPHEN
00B0..00B4	# °..´;	DEGREE SIGN..ACUTE ACCENT
00B6..00BA	# ¶..º;	PILCROW SIGN..MASCULINE ORDINAL INDICATOR
00BC..00BF	# ¼..¿;	VULGAR FRACTION ONE QUARTER..INVERTED QUESTION MARK
00C6		# Æ;	LATIN CAPITAL LETTER AE
00D0		# Ð;	LATIN CAPITAL LETTER ETH
00D7..00D8	# ×..Ø;	MULTIPLICATION SIGN..LATIN CAPITAL LETTER O WITH STROKE
00DE..00E1	# Þ..á;	LATIN CAPITAL LETTER THORN..LATIN SMALL LETTER A WITH ACUTE
00E6		# æ;	LATIN SMALL LETTER AE
00E8..00EA	# è..ê;	LATIN SMALL LETTER E WITH GRAVE..LATIN SMALL LETTER E WITH CIRCUMFLEX
00EC..00ED	# ì..í;	LATIN SMALL LETTER I WITH GRAVE..LATIN SMALL LETTER I WITH ACUTE
00F0		# ð;	LATIN SMALL LETTER ETH
00F2..00F3	# ò..ó;	LATIN SMALL LETTER O WITH GRAVE..LATIN SMALL LETTER O WITH ACUTE
00F7..00FA	# ÷..ú;	DIVISION SIGN..LATIN SMALL LETTER U WITH ACUTE
00FC		# ü;	LATIN SMALL LETTER U WITH DIAERESIS
00FE		# þ;	LATIN SMALL LETTER THORN
0101		# ā;	LATIN SMALL LETTER A WITH MACRON
0111		# đ;	LATIN SMALL LETTER D WITH STROKE
0113		# ē;	LATIN SMALL LETTER E WITH MACRON
011B		# ě;	LATIN SMALL LETTER E WITH CARON
0126..0127	# Ħ..ħ;	LATIN CAPITAL LETTER H WITH STROKE..LATIN SMALL LETTER H WITH STROKE
012B		# ī;	LATIN SMALL LETTER I WITH MACRON
0131..0133	# ı..ĳ;	LATIN SMALL LETTER DOTLESS I..LATIN SMALL LIGATURE IJ
0138		# ĸ;	LATIN SMALL LETTER KRA
013F..0142	# Ŀ..ł;	LATIN CAPITAL LETTER L WITH MIDDLE DOT..LATIN SMALL LETTER L WITH STROKE
0144		# ń;	LATIN SMALL LETTER N WITH ACUTE
0148..014B	# ň..ŋ;	LATIN SMALL LETTER N WITH CARON..LATIN SMALL LETTER ENG
014D		# ō;	LATIN SMALL LETTER O WITH MACRON
0152..0153	# Œ..œ;	LATIN CAPITAL LIGATURE OE..LATIN SMALL LIGATURE OE
0166..0167	# Ŧ..ŧ;	LATIN CAPITAL LETTER T WITH STROKE..LATIN SMALL LETTER T WITH STROKE
016B		# ū;	LATIN SMALL LETTER U WITH MACRON
01CE		# ǎ;	LATIN SMALL LETTER A WITH CARON
01D0		# ǐ;	LATIN SMALL LETTER I WITH CARON
01D2		# ǒ;	LATIN SMALL LETTER O WITH CARON
01D4		# ǔ;	LATIN SMALL LETTER U WITH CARON
01D6		# ǖ;	LATIN SMALL LETTER U WITH DIAERESIS AND MACRON
01D8		# ǘ;	LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE
01DA		# ǚ;	LATIN SMALL LETTER U WITH DIAERESIS AND CARON
01DC		# ǜ;	LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE
0251		# ɑ;	LATIN SMALL LETTER ALPHA
0261		# ɡ;	LATIN SMALL LETTER SCRIPT G
02C7		# ˇ;	CARON
02C9..02CB	# ˉ..ˋ;	MODIFIER LETTER MACRON..MODIFIER LETTER GRAVE ACCENT
02CD		# ˍ;	MODIFIER LETTER LOW MACRON
02D0		# ː;	MODIFIER LETTER TRIANGULAR COLON
02D8..02DB	# ˘..˛;	BREVE..OGONEK
02DD		# ˝;	DOUBLE ACUTE ACCENT
0300..0361	# ◦̀..◦͡;	COMBINING GRAVE ACCENT..COMBINING DOUBLE INVERTED BREVE
0391..03A9	# Α..Ω;	GREEK CAPITAL LETTER ALPHA..GREEK CAPITAL LETTER OMEGA
03B1..03C1	# α..ρ;	GREEK SMALL LETTER ALPHA..GREEK SMALL LETTER RHO
03C3..03C9	# σ..ω;	GREEK SMALL LETTER SIGMA..GREEK SMALL LETTER OMEGA
0401		# Ё;	CYRILLIC CAPITAL LETTER IO
0410..044F	# А..я;	CYRILLIC CAPITAL LETTER A..CYRILLIC SMALL LETTER YA
0451		# ё;	CYRILLIC SMALL LETTER IO
2010		# ‐;	HYPHEN
2013..2016	# –..‖;	EN DASH..DOUBLE VERTICAL LINE
2018..2019	# ‘..’;	LEFT SINGLE QUOTATION MARK..RIGHT SINGLE QUOTATION MARK
201C..201D	# “..”;	LEFT DOUBLE QUOTATION MARK..RIGHT DOUBLE QUOTATION MARK
2020..2021	# †..‡;	DAGGER..DOUBLE DAGGER
2025..2027	# ‥..‧;	TWO DOT LEADER..HYPHENATION POINT
2030		# ‰;	PER MILLE SIGN
2032..2033	# ′..″;	PRIME..DOUBLE PRIME
2035		# ‵;	REVERSED PRIME
203B		# ※;	REFERENCE MARK
2074		# ⁴;	SUPERSCRIPT FOUR
207F		# ⁿ;	SUPERSCRIPT LATIN SMALL LETTER N
2081..2084	# ₁..₄;	SUBSCRIPT ONE..SUBSCRIPT FOUR
20AC		# €;	EURO SIGN
2103		# ℃;	DEGREE CELSIUS
2105		# ℅;	CARE OF
2109		# ℉;	DEGREE FAHRENHEIT
2113		# ℓ;	SCRIPT SMALL L
2116		# №;	NUMERO SIGN
2121..2122	# ℡..™;	TELEPHONE SIGN..TRADE MARK SIGN
2126		# Ω;	OHM SIGN
212B		# Å;	ANGSTROM SIGN
2153..2154	# ⅓..⅔;	VULGAR FRACTION ONE THIRD..VULGAR FRACTION TWO THIRDS
215B..215E	# ⅛..⅞;	VULGAR FRACTION ONE EIGHTH..VULGAR FRACTION SEVEN EIGHTHS
2160..216B	# Ⅰ..Ⅻ;	ROMAN NUMERAL ONE..ROMAN NUMERAL TWELVE
2170..2179	# ⅰ..ⅹ;	SMALL ROMAN NUMERAL ONE..SMALL ROMAN NUMERAL TEN
2190..2199	# ←..↙;	LEFTWARDS ARROW..SOUTH WEST ARROW
21D2		# ⇒;	RIGHTWARDS DOUBLE ARROW
21D4		# ⇔;	LEFT RIGHT DOUBLE ARROW
2200		# ∀;	FOR ALL
2202..2203	# ∂..∃;	PARTIAL DIFFERENTIAL..THERE EXISTS
2207..2208	# ∇..∈;	NABLA..ELEMENT OF
220B		# ∋;	CONTAINS AS MEMBER
220F		# ∏;	N-ARY PRODUCT
2211		# ∑;	N-ARY SUMMATION
2215		# ∕;	DIVISION SLASH
221A		# √;	SQUARE ROOT
221D..2220	# ∝..∠;	PROPORTIONAL TO..ANGLE
2223		# ∣;	DIVIDES
2225		# ∥;	PARALLEL TO
2227..222C	# ∧..∬;	LOGICAL AND..DOUBLE INTEGRAL
222E		# ∮;	CONTOUR INTEGRAL
2234..2237	# ∴..∷;	THEREFORE..PROPORTION
223C..223D	# ∼..∽;	TILDE OPERATOR..REVERSED TILDE
2248		# ≈;	ALMOST EQUAL TO
224C		# ≌;	ALL EQUAL TO
2252		# ≒;	APPROXIMATELY EQUAL TO OR THE IMAGE OF
2260..2261	# ≠..≡;	NOT EQUAL TO..IDENTICAL TO
2264..2267	# ≤..≧;	LESS-THAN OR EQUAL TO..GREATER-THAN OVER EQUAL TO
226A..226B	# ≪..≫;	MUCH LESS-THAN..MUCH GREATER-THAN
226E..226F	# ≮..≯;	NOT LESS-THAN..NOT GREATER-THAN
2282..2283	# ⊂..⊃;	SUBSET OF..SUPERSET OF
2286..2287	# ⊆..⊇;	SUBSET OF OR EQUAL TO..SUPERSET OF OR EQUAL TO
2295		# ⊕;	CIRCLED PLUS
2299		# ⊙;	CIRCLED DOT OPERATOR
22A5		# ⊥;	UP TACK
22BF		# ⊿;	RIGHT TRIANGLE
2312		# ⌒;	ARC
2460..24B5	# ①..⒵;	CIRCLED DIGIT ONE..PARENTHESIZED LATIN SMALL LETTER Z
24D0..24E9	# ⓐ..ⓩ;	CIRCLED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
2500..254B	# ─..╋;	BOX DRAWINGS LIGHT HORIZONTAL..BOX DRAWINGS HEAVY VERTICAL AND HORIZONTAL
2550..2574	# ═..╴;	BOX DRAWINGS DOUBLE HORIZONTAL..BOX DRAWINGS LIGHT LEFT
2581..258F	# ▁..▏;	LOWER ONE EIGHTH BLOCK..LEFT ONE EIGHTH BLOCK
2592..25A1	# ▒..□;	MEDIUM SHADE..WHITE SQUARE
25A3..25A9	# ▣..▩;	WHITE SQUARE CONTAINING BLACK SMALL SQUARE..SQUARE WITH DIAGONAL CROSSHATCH FILL
25B2..25B3	# ▲..△;	BLACK UP-POINTING TRIANGLE..WHITE UP-POINTING TRIANGLE
25B6..25B7	# ▶..▷;	BLACK RIGHT-POINTING TRIANGLE..WHITE RIGHT-POINTING TRIANGLE
25BC..25BD	# ▼..▽;	BLACK DOWN-POINTING TRIANGLE..WHITE DOWN-POINTING TRIANGLE
25C0..25C1	# ◀..◁;	BLACK LEFT-POINTING TRIANGLE..WHITE LEFT-POINTING TRIANGLE
25C6..25C8	# ◆..◈;	BLACK DIAMOND..WHITE DIAMOND CONTAINING BLACK SMALL DIAMOND
25CB		# ○;	WHITE CIRCLE
25CE..25D1	# ◎..◑;	BULLSEYE..CIRCLE WITH RIGHT HALF BLACK
25E2..25E5	# ◢..◥;	BLACK LOWER RIGHT TRIANGLE..BLACK UPPER RIGHT TRIANGLE
25EF		# ◯;	LARGE CIRCLE
2605..2606	# ★..☆;	BLACK STAR..WHITE STAR
2609		# ☉;	SUN
260E..260F	# ☎..☏;	BLACK TELEPHONE..WHITE TELEPHONE
261C		# ☜;	WHITE LEFT POINTING INDEX
261E		# ☞;	WHITE RIGHT POINTING INDEX
2640		# ♀;	FEMALE SIGN
2642		# ♂;	MALE SIGN
2660..2661	# ♠..♡;	BLACK SPADE SUIT..WHITE HEART SUIT
2663..2665	# ♣..♥;	BLACK CLUB SUIT..BLACK HEART SUIT
2667..266A	# ♧..♪;	WHITE CLUB SUIT..EIGHTH NOTE
266C..266D	# ♬..♭;	BEAMED SIXTEENTH NOTES..MUSIC FLAT SIGN
266F		# ♯;	MUSIC SHARP SIGN
H - Halfwidth
20A9		# ₩;	WON SIGN
FF61..FF64	# ｡..､;	HALFWIDTH IDEOGRAPHIC FULL STOP..HALFWIDTH IDEOGRAPHIC COMMA
N - Narrow
0020..007E	# ␣..~;	SPACE..TILDE
00A2..00A3	# ¢..£;	CENT SIGN..POUND SIGN
00A5..00A6	# ¥..¦;	YEN SIGN..BROKEN BAR
00AC		# ¬;	NOT SIGN
00AF		# ¯;	MACRON
N - Not-East Asian Neutral
0000..001F	# ^@..^_;	(!control!)..(!control!)
007F..00A0	# ^?.. ;	(!control!)..NO-BREAK SPACE
00A9		# ©;	COPYRIGHT SIGN
00AB		# «;	LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
00AE		# ®;	REGISTERED SIGN
00B5		# µ;	MICRO SIGN
00BB		# »;	RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
00C0..00C5	# À..Å;	LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER A WITH RING ABOVE
00C7..00CF	# Ç..Ï;	LATIN CAPITAL LETTER C WITH CEDILLA..LATIN CAPITAL LETTER I WITH DIAERESIS
00D1..00D6	# Ñ..Ö;	LATIN CAPITAL LETTER N WITH TILDE..LATIN CAPITAL LETTER O WITH DIAERESIS
00D9..00DD	# Ù..Ý;	LATIN CAPITAL LETTER U WITH GRAVE..LATIN CAPITAL LETTER Y WITH ACUTE
00E2..00E5	# â..å;	LATIN SMALL LETTER A WITH CIRCUMFLEX..LATIN SMALL LETTER A WITH RING ABOVE
00E7		# ç;	LATIN SMALL LETTER C WITH CEDILLA
00EB		# ë;	LATIN SMALL LETTER E WITH DIAERESIS
00EE..00EF	# î..ï;	LATIN SMALL LETTER I WITH CIRCUMFLEX..LATIN SMALL LETTER I WITH DIAERESIS
00F1		# ñ;	LATIN SMALL LETTER N WITH TILDE
00F4..00F6	# ô..ö;	LATIN SMALL LETTER O WITH CIRCUMFLEX..LATIN SMALL LETTER O WITH DIAERESIS
00FB		# û;	LATIN SMALL LETTER U WITH CIRCUMFLEX
00FD		# ý;	LATIN SMALL LETTER Y WITH ACUTE
00FF..0100	# ÿ..Ā;	LATIN SMALL LETTER Y WITH DIAERESIS..LATIN CAPITAL LETTER A WITH MACRON
0102..0110	# Ă..Đ;	LATIN CAPITAL LETTER A WITH BREVE..LATIN CAPITAL LETTER D WITH STROKE
0112		# Ē;	LATIN CAPITAL LETTER E WITH MACRON
0114..011A	# Ĕ..Ě;	LATIN CAPITAL LETTER E WITH BREVE..LATIN CAPITAL LETTER E WITH CARON
011C..0125	# Ĝ..ĥ;	LATIN CAPITAL LETTER G WITH CIRCUMFLEX..LATIN SMALL LETTER H WITH CIRCUMFLEX
0128..012A	# Ĩ..Ī;	LATIN CAPITAL LETTER I WITH TILDE..LATIN CAPITAL LETTER I WITH MACRON
012C..0130	# Ĭ..İ;	LATIN CAPITAL LETTER I WITH BREVE..LATIN CAPITAL LETTER I WITH DOT ABOVE
0134..0137	# Ĵ..ķ;	LATIN CAPITAL LETTER J WITH CIRCUMFLEX..LATIN SMALL LETTER K WITH CEDILLA
0139..013E	# Ĺ..ľ;	LATIN CAPITAL LETTER L WITH ACUTE..LATIN SMALL LETTER L WITH CARON
0143		# Ń;	LATIN CAPITAL LETTER N WITH ACUTE
0145..0147	# Ņ..Ň;	LATIN CAPITAL LETTER N WITH CEDILLA..LATIN CAPITAL LETTER N WITH CARON
014C		# Ō;	LATIN CAPITAL LETTER O WITH MACRON
014E..0151	# Ŏ..ő;	LATIN CAPITAL LETTER O WITH BREVE..LATIN SMALL LETTER O WITH DOUBLE ACUTE
0154..0165	# Ŕ..ť;	LATIN CAPITAL LETTER R WITH ACUTE..LATIN SMALL LETTER T WITH CARON
0168..016A	# Ũ..Ū;	LATIN CAPITAL LETTER U WITH TILDE..LATIN CAPITAL LETTER U WITH MACRON
016C..01CD	# Ŭ..Ǎ;	LATIN CAPITAL LETTER U WITH BREVE..LATIN CAPITAL LETTER A WITH CARON
01CF		# Ǐ;	LATIN CAPITAL LETTER I WITH CARON
01D1		# Ǒ;	LATIN CAPITAL LETTER O WITH CARON
01D3		# Ǔ;	LATIN CAPITAL LETTER U WITH CARON
01D5		# Ǖ;	LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON
01D7		# Ǘ;	LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
01D9		# Ǚ;	LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON
01DB		# Ǜ;	LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE
01DD..0250	# ǝ..ɐ;	LATIN SMALL LETTER TURNED E..LATIN SMALL LETTER TURNED A
0252..0260	# ɒ..ɠ;	LATIN SMALL LETTER TURNED ALPHA..LATIN SMALL LETTER G WITH HOOK
0262..02A8	# ɢ..ʨ;	LATIN LETTER SMALL CAPITAL G..LATIN SMALL LETTER TC DIGRAPH WITH CURL
02B0..02C6	# ʰ..ˆ;	MODIFIER LETTER SMALL H..MODIFIER LETTER CIRCUMFLEX ACCENT
02C8		# ˈ;	MODIFIER LETTER VERTICAL LINE
02CC		# ˌ;	MODIFIER LETTER LOW VERTICAL LINE
02CE..02CF	# ˎ..ˏ;	MODIFIER LETTER LOW GRAVE ACCENT..MODIFIER LETTER LOW ACUTE ACCENT
02D1..02D7	# ˑ..˗;	MODIFIER LETTER HALF TRIANGULAR COLON..MODIFIER LETTER MINUS SIGN
02DC		# ˜;	SMALL TILDE
02DE		# ˞;	MODIFIER LETTER RHOTIC HOOK
02E0..02E9	# ˠ..˩;	MODIFIER LETTER SMALL GAMMA..MODIFIER LETTER EXTRA-LOW TONE BAR
0374..0390	# ʹ..ΐ;	GREEK NUMERAL SIGN..GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
03AA..03B0	# Ϊ..ΰ;	GREEK CAPITAL LETTER IOTA WITH DIALYTIKA..GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
03C2		# ς;	GREEK SMALL LETTER FINAL SIGMA
03CA..03EF	# ϊ..ϯ;	GREEK SMALL LETTER IOTA WITH DIALYTIKA..COPTIC SMALL LETTER DEI
0400		# Ѐ;	
0402..040F	# Ђ..Џ;	CYRILLIC CAPITAL LETTER DJE..CYRILLIC CAPITAL LETTER DZHE
0450		# ѐ;	
0452..0486	# ђ..◦҆;	CYRILLIC SMALL LETTER DJE..COMBINING CYRILLIC PSILI PNEUMATA
0490..04F9	# Ґ..ӹ;	CYRILLIC CAPITAL LETTER GHE WITH UPTURN..CYRILLIC SMALL LETTER YERU WITH DIAERESIS
0531..0556	# Ա..Ֆ;	ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITAL LETTER FEH
0559..055F	# ՙ..՟;	ARMENIAN MODIFIER LETTER LEFT HALF RING..ARMENIAN ABBREVIATION MARK
0561..0587	# ա..և;	ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
0589		# ։;	ARMENIAN FULL STOP
0591..05F4	# ◦֑..״;	HEBREW ACCENT ETNAHTA..HEBREW PUNCTUATION GERSHAYIM
060C..06F9	# ،..۹;	ARABIC COMMA..EXTENDED ARABIC-INDIC DIGIT NINE
0901..0970	# ◦ँ..॰;	DEVANAGARI SIGN CANDRABINDU..DEVANAGARI ABBREVIATION SIGN
0981..09FA	# ◦ঁ..৺;	BENGALI SIGN CANDRABINDU..BENGALI ISSHAR
0A02..0A74	# ◦ਂ..ੴ;	GURMUKHI SIGN BINDI..GURMUKHI EK ONKAR
0A81..0AEF	# ◦ઁ..૯;	GUJARATI SIGN CANDRABINDU..GUJARATI DIGIT NINE
0B01..0B70	# ◦ଁ..୰;	ORIYA SIGN CANDRABINDU..ORIYA ISSHAR
0B82..0BF2	# ◦ஂ..௲;	TAMIL SIGN ANUSVARA..TAMIL NUMBER ONE THOUSAND
0C01..0C6F	# ◦ఁ..౯;	TELUGU SIGN CANDRABINDU..TELUGU DIGIT NINE
0C82..0CEF	# ◦ಂ..೯;	KANNADA SIGN ANUSVARA..KANNADA DIGIT NINE
0D02..0D6F	# ◦ം..൯;	MALAYALAM SIGN ANUSVARA..MALAYALAM DIGIT NINE
0E01..0E5B	# ก..๛;	THAI CHARACTER KO KAI..THAI CHARACTER KHOMUT
0E81..0EDD	# ກ..ໝ;	LAO LETTER KO..LAO HO MO
0F00..0FB9	# ༀ..◦ྐྵ;	TIBETAN SYLLABLE OM..TIBETAN SUBJOINED LETTER KSSA
10A0..10F6	# Ⴀ..ჶ;	GEORGIAN CAPITAL LETTER AN..GEORGIAN LETTER FI
10FB		# ჻;	GEORGIAN PARAGRAPH SEPARATOR
1E00..1EF9	# Ḁ..ỹ;	LATIN CAPITAL LETTER A WITH RING BELOW..LATIN SMALL LETTER Y WITH TILDE
1F00..1FFE	# ἀ..῾;	GREEK SMALL LETTER ALPHA WITH PSILI..GREEK DASIA
2000..200F	#  ..‏;	EN QUAD..RIGHT-TO-LEFT MARK
2011..2012	# ‑..‒;	NON-BREAKING HYPHEN..FIGURE DASH
2017		# ‗;	DOUBLE LOW LINE
201A..201B	# ‚..‛;	SINGLE LOW-9 QUOTATION MARK..SINGLE HIGH-REVERSED-9 QUOTATION MARK
201E..201F	# „..‟;	DOUBLE LOW-9 QUOTATION MARK..DOUBLE HIGH-REVERSED-9 QUOTATION MARK
2022..2024	# •..․;	BULLET..ONE DOT LEADER
2028..202E	#  ..‮;	LINE SEPARATOR..RIGHT-TO-LEFT OVERRIDE
2031		# ‱;	PER TEN THOUSAND SIGN
2034		# ‴;	TRIPLE PRIME
2036..203A	# ‶..›;	REVERSED DOUBLE PRIME..SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
203C..2046	# ‼..⁆;	DOUBLE EXCLAMATION MARK..RIGHT SQUARE BRACKET WITH QUILL
206A..2070	# ⁪..⁰;	INHIBIT SYMMETRIC SWAPPING..SUPERSCRIPT ZERO
2075..207E	# ⁵..⁾;	SUPERSCRIPT FIVE..SUPERSCRIPT RIGHT PARENTHESIS
2080		# ₀;	SUBSCRIPT ZERO
2085..208E	# ₅..₎;	SUBSCRIPT FIVE..SUBSCRIPT RIGHT PARENTHESIS
20A0..20A8	# ₠..₨;	EURO-CURRENCY SIGN..RUPEE SIGN
20AA..20AB	# ₪..₫;	NEW SHEQEL SIGN..DONG SIGN
20D0..2102	# ◦⃐..ℂ;	COMBINING LEFT HARPOON ABOVE..DOUBLE-STRUCK CAPITAL C
2104		# ℄;	CENTRE LINE SYMBOL
2106..2108	# ℆..℈;	CADA UNA..SCRUPLE
210A..2112	# ℊ..ℒ;	SCRIPT SMALL G..SCRIPT CAPITAL L
2114..2115	# ℔..ℕ;	L B BAR SYMBOL..DOUBLE-STRUCK CAPITAL N
2117..2120	# ℗..℠;	SOUND RECORDING COPYRIGHT..SERVICE MARK
2123..2125	# ℣..℥;	VERSICLE..OUNCE SIGN
2127..212A	# ℧..K;	INVERTED OHM SIGN..KELVIN SIGN
212C..2138	# ℬ..ℸ;	SCRIPT CAPITAL B..DALET SYMBOL
2155..215A	# ⅕..⅚;	VULGAR FRACTION ONE FIFTH..VULGAR FRACTION FIVE SIXTHS
215F		# ⅟;	FRACTION NUMERATOR ONE
216C..216F	# Ⅼ..Ⅿ;	ROMAN NUMERAL FIFTY..ROMAN NUMERAL ONE THOUSAND
217A..2182	# ⅺ..ↂ;	SMALL ROMAN NUMERAL ELEVEN..ROMAN NUMERAL TEN THOUSAND
219A..21D1	# ↚..⇑;	LEFTWARDS ARROW WITH STROKE..UPWARDS DOUBLE ARROW
21D3		# ⇓;	DOWNWARDS DOUBLE ARROW
21D5..21EA	# ⇕..⇪;	UP DOWN DOUBLE ARROW..UPWARDS WHITE ARROW FROM BAR
2201		# ∁;	COMPLEMENT
2204..2206	# ∄..∆;	THERE DOES NOT EXIST..INCREMENT
2209..220A	# ∉..∊;	NOT AN ELEMENT OF..SMALL ELEMENT OF
220C..220E	# ∌..∎;	DOES NOT CONTAIN AS MEMBER..END OF PROOF
2210		# ∐;	N-ARY COPRODUCT
2212..2214	# −..∔;	MINUS SIGN..DOT PLUS
2216..2219	# ∖..∙;	SET MINUS..BULLET OPERATOR
221B..221C	# ∛..∜;	CUBE ROOT..FOURTH ROOT
2221..2222	# ∡..∢;	MEASURED ANGLE..SPHERICAL ANGLE
2224		# ∤;	DOES NOT DIVIDE
2226		# ∦;	NOT PARALLEL TO
222D		# ∭;	TRIPLE INTEGRAL
222F..2233	# ∯..∳;	SURFACE INTEGRAL..ANTICLOCKWISE CONTOUR INTEGRAL
2238..223B	# ∸..∻;	DOT MINUS..HOMOTHETIC
223E..2247	# ∾..≇;	INVERTED LAZY S..NEITHER APPROXIMATELY NOR ACTUALLY EQUAL TO
2249..224B	# ≉..≋;	NOT ALMOST EQUAL TO..TRIPLE TILDE
224D..2251	# ≍..≑;	EQUIVALENT TO..GEOMETRICALLY EQUAL TO
2253..225F	# ≓..≟;	IMAGE OF OR APPROXIMATELY EQUAL TO..QUESTIONED EQUAL TO
2262..2263	# ≢..≣;	NOT IDENTICAL TO..STRICTLY EQUIVALENT TO
2268..2269	# ≨..≩;	LESS-THAN BUT NOT EQUAL TO..GREATER-THAN BUT NOT EQUAL TO
226C..226D	# ≬..≭;	BETWEEN..NOT EQUIVALENT TO
2270..2281	# ≰..⊁;	NEITHER LESS-THAN NOR EQUAL TO..DOES NOT SUCCEED
2284..2285	# ⊄..⊅;	NOT A SUBSET OF..NOT A SUPERSET OF
2288..2294	# ⊈..⊔;	NEITHER A SUBSET OF NOR EQUAL TO..SQUARE CUP
2296..2298	# ⊖..⊘;	CIRCLED MINUS..CIRCLED DIVISION SLASH
229A..22A4	# ⊚..⊤;	CIRCLED RING OPERATOR..DOWN TACK
22A6..22BE	# ⊦..⊾;	ASSERTION..RIGHT ANGLE WITH ARC
22C0..2311	# ⋀..⌑;	N-ARY LOGICAL AND..SQUARE LOZENGE
2313..244A	# ⌓..⑊;	SEGMENT..OCR DOUBLE BACKSLASH
24B6..24CF	# Ⓐ..Ⓩ;	CIRCLED LATIN CAPITAL LETTER A..CIRCLED LATIN CAPITAL LETTER Z
24EA		# ⓪;	CIRCLED DIGIT ZERO
254C..254F	# ╌..╏;	BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL..BOX DRAWINGS HEAVY DOUBLE DASH VERTICAL
2575..2580	# ╵..▀;	BOX DRAWINGS LIGHT UP..UPPER HALF BLOCK
2590..2591	# ▐..░;	RIGHT HALF BLOCK..LIGHT SHADE
25A2		# ▢;	WHITE SQUARE WITH ROUNDED CORNERS
25AA..25B1	# ▪..▱;	BLACK SMALL SQUARE..WHITE PARALLELOGRAM
25B4..25B5	# ▴..▵;	BLACK UP-POINTING SMALL TRIANGLE..WHITE UP-POINTING SMALL TRIANGLE
25B8..25BB	# ▸..▻;	BLACK RIGHT-POINTING SMALL TRIANGLE..WHITE RIGHT-POINTING POINTER
25BE..25BF	# ▾..▿;	BLACK DOWN-POINTING SMALL TRIANGLE..WHITE DOWN-POINTING SMALL TRIANGLE
25C2..25C5	# ◂..◅;	BLACK LEFT-POINTING SMALL TRIANGLE..WHITE LEFT-POINTING POINTER
25C9..25CA	# ◉..◊;	FISHEYE..LOZENGE
25CC..25CD	# ◌..◍;	DOTTED CIRCLE..CIRCLE WITH VERTICAL FILL
25D2..25E1	# ◒..◡;	CIRCLE WITH LOWER HALF BLACK..LOWER HALF CIRCLE
25E6..25EE	# ◦..◮;	WHITE BULLET..UP-POINTING TRIANGLE WITH RIGHT HALF BLACK
2600..2604	# ☀..☄;	BLACK SUN WITH RAYS..COMET
2607..2608	# ☇..☈;	LIGHTNING..THUNDERSTORM
260A..260D	# ☊..☍;	ASCENDING NODE..OPPOSITION
2610..261B	# ☐..☛;	BALLOT BOX..BLACK RIGHT POINTING INDEX
261D		# ☝;	WHITE UP POINTING INDEX
261F..263F	# ☟..☿;	WHITE DOWN POINTING INDEX..MERCURY
2641		# ♁;	EARTH
2643..265F	# ♃..♟;	JUPITER..BLACK CHESS PAWN
2662		# ♢;	WHITE DIAMOND SUIT
2666		# ♦;	BLACK DIAMOND SUIT
266B		# ♫;	BEAMED EIGHTH NOTES
266E		# ♮;	MUSIC NATURAL SIGN
2701..27BE	# ✁..➾;	UPPER BLADE SCISSORS..OPEN-OUTLINED RIGHTWARDS ARROW
3105..312C	# ㄅ..ㄬ;	BOPOMOFO LETTER B..BOPOMOFO LETTER GN
FB00..FB06	# ﬀ..ﬆ;	LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST
FB13..FB17	# ﬓ..ﬗ;	ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
FB1E..FDFB	# ◦ﬞ..ﷻ;	HEBREW POINT JUDEO-SPANISH VARIKA..ARABIC LIGATURE JALLAJALALOUHOU
FE20..FE23	# ◦︠..◦︣;	COMBINING LIGATURE LEFT HALF..COMBINING DOUBLE TILDE RIGHT HALF
FE70..FEFC	# ﹰ..ﻼ;	ARABIC FATHATAN ISOLATED FORM..ARABIC LIGATURE LAM WITH ALEF FINAL FORM
FEFF		# ;	ZERO WIDTH NO-BREAK SPACE
FF65..FFDC	# ･..ￜ;	HALFWIDTH KATAKANA MIDDLE DOT..HALFWIDTH HANGUL LETTER I
FFE8..FFEE	# ￨..￮;	HALFWIDTH FORMS LIGHT VERTICAL..HALFWIDTH WHITE CIRCLE
FFFC..FFFD	# ..�;	OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
W - Wide
1100..11F9	# ᄀ..ᇹ;	HANGUL CHOSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH
3000..303F	# 　..〿;	IDEOGRAPHIC SPACE..IDEOGRAPHIC HALF FILL SPACE
3041..3094	# ぁ..ゔ;	HIRAGANA LETTER SMALL A..HIRAGANA LETTER VU
3099..309E	# ◦゙..ゞ;	COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..HIRAGANA VOICED ITERATION MARK
30A1..30FE	# ァ..ヾ;	KATAKANA LETTER SMALL A..KATAKANA VOICED ITERATION MARK
3131..318E	# ㄱ..ㆎ;	HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
3190..319F	# ㆐..㆟;	IDEOGRAPHIC ANNOTATION LINKING MARK..IDEOGRAPHIC ANNOTATION MAN MARK
3200..321C	# ㈀..㈜;	PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED HANGUL CIEUC U
3220..3243	# ㈠..㉃;	PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH REACH
3260..32B0	# ㉠..㊰;	CIRCLED HANGUL KIYEOK..CIRCLED IDEOGRAPH NIGHT
32C0..3376	# ㋀..㍶;	IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY..SQUARE PC
337B..33DD	# ㍻..㏝;	SQUARE ERA NAME HEISEI..SQUARE WB
33E0..33FE	# ㏠..㏾;	IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ONE..IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY THIRTY-ONE
4E00..9FA5	# 一..龥;	(!CJK Ideograph, First!)..(!CJK Ideograph, Last!)
AC00..D7A3	# 가..힣;	(!Hangul Syllable, First!)..(!Hangul Syllable, Last!)
E000..E757	# ..;	(!Private Use, First!)..(!Private Use, First!)
F900..FA2D	# 豈..鶴;	CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA2D
F - FullWidth
FE30..FE44	# ︰..﹄;	PRESENTATION FORM FOR VERTICAL TWO DOT LEADER..PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
FE49..FE52	# ﹉..﹒;	DASHED OVERLINE..SMALL FULL STOP
FE54..FE6B	# ﹔..﹫;	SMALL SEMICOLON..SMALL COMMERCIAL AT
FF01..FF5E	# ！..～;	FULLWIDTH EXCLAMATION MARK..FULLWIDTH TILDE
FFE0..FFE6	# ￠..￦;	FULLWIDTH CENT SIGN..FULLWIDTH WON SIGN

Background

What ISO/IEC 10646:1993 says

ISO 10646 is silent on the terms "half-width" and "full-width" except to say that the characters so named are provided for compatibility.

What the Unicode Standard, Version 2.1 says

The Unicode Standard states (p. 6-130):

In the context of conversion to and from such mixed-width encodings, all characters in the General Scripts area [i.e. 0000-1FFF] should be construed as half-width (hankaku) characters.

This sentence, as it stands, is misleading in that it implies that everything in the range U+0000..U+1FFF is half-width.

All characters in the CJK Phonetics and Symbols area [i.e. 3000-33FF] and the Unified CJK Ideograph area [i.e. 4E00-9FFF], along with the characters in the CJK Compatibility Ideographs [i.e. F900-FAFF], CJK Compatibility Forms [i.e. FE30-FE4F], and Small Form Variants blocks [i.e. FE50-FE6F], should be construed as full-width (zenkaku) characters. Other Compatibility Area [i.e. F900-FFFF] characters outside of the current block should be construed as half-width characters. The characters of the Symbols Area are neutral regarding their width semantics.

It should clearly be noted that statements made in the Unicode Standard in Chapter 6 (Character Block Descriptions) do not have normative status. Chapters 3, 4, and 7 (Charts) have normative status. The rest of the book, including Chapter 6 is provided to give as much information as possible to help people understand and implement the characters correctly. But it is dangerous to make legalistic arguments based on the text of Chapter 6, since there is rather large leeway for the editors of the Unicode Standard to modify and augment such explanatory text as new issues arise or old ones require more clarification.

The intent of the existing paragraph is not to create a property but to account for the fact that there are full-width forms encoded in the ranges U+FF01..U+FF5E and U+FFE0..U+FFE6.

Acknowledgments

Michel Suignard provided extensive input into the analysis and source material for the detail assignments of these properties.

Authors

Asmus Freytag wrote the main document. Ken Whistler provided the base text for the background section. Mark Davis provided the UTF-8 and names annotations.

Changes from previous revisions:

First draft technical report version. Extensive formatting to fit the template. Split Wide into Wide and FullWidth to capture the characters with explicit FullWidth characteristics.

First Technical Report Version. Remove list of 'unassigned' characters. Add some informative text and make other editorial changes requested at UTC meeting #78.

Second Technical Report Version. Added UTF-8 and names annotations to the table. Minor wording changes. HTML fixes.

Copyright

Copyright © 1998-1998 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report.

Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.

Unicode Home Page: http://www.unicode.org

Unicode Technical Reports: http://www.unicode.org/unicode/reports/techreports.html

Unicode Technical Report #11 Unicode Character Property "East Asian Width"