Should U+3248 ... U+324F be wide characters?
Asmus Freytag (c) via Unicode
unicode at unicode.org
Thu Aug 17 11:50:39 CDT 2017
On 8/17/2017 7:24 AM, Mike FABIAN wrote:
> Asmus Freytag via Unicode <unicode at unicode.org> さんはかきました:
>> On 8/16/2017 6:26 AM, Mike FABIAN via Unicode wrote:
>> EastAsianWidth.txt contains:
>> 3248..324F;A # No  CIRCLED NUMBER TEN ON BLACK SQUARE..CIRCLED NUMBER EIGHTY ON BLACK SQUARE
>> i.e. it classifies the width of the characters at codepoints
>> between 3248 and 324F as ambiguous.
>> Is this really correct? Shouldn’t they be “W”, i.e. wide?
>> In most fonts these characters seem to be square shaped wide characters.
>> "W" not only implies display width, but also a different treatment in the context of line
>> breaking and vertical layout of text.
>> "W" characters behave more like Ideographs, for the most part, while "N" are treated as
>> forming words (for the most part).
> Most emoji now have "W", for example:
> 1F600..1F64F;W # So  GRINNING FACE..PERSON WITH FOLDED HANDS
> That seems correct because emoji behave more like Ideographs.
> Isn’t this the same for “CIRCLED NUMBER TEN ON BLACK SQUARE”?
> This seems to me also more like an Ideograph.
>> "A" means, you get to decide whether to treat these as "W" or "N" based on context. If
>> used in a non ideographic context, they behave like all other symbols (but happen to fill
>> an EM square).
"A" means, you get to decide whether to treat these as "W" or "N" based on context.
There's really not strong need to change an "A" towards "W", because "A" doesn't get in your way if you decided that "W" works better for you.
Remember that all the EAW properties ares supposed to be "resolved" down to W or N. For some, like Na that resolution is deterministic, for A it is context/application dependent, but when you finally process your data, only W(ide) or N(arrow) remain after resolution.
More information about the Unicode