Grapheme clusters and east asian width
richard.wordingham at ntlworld.com
Wed Sep 16 19:19:39 CDT 2015
On Wed, 16 Sep 2015 22:56:42 +0100
Daniel Bünzli <daniel.buenzli at erratique.ch> wrote:
> Le mercredi, 16 septembre 2015 à 22:14, Asmus Freytag (t) a écrit :
> > "N" doesn't mean "narrow" but "neutral" - that is, the width is
> > given by other consideration.
> Ah right ! Thanks. Narrow is Na.
> So a refined algorithm would be to actually do the summation in each
> grapheme cluster as I initially wanted to do with the mapping (F, W
> -> 2), (Na, H -> 1) (N -> 0) and if I get a 0 fallback on 1 or maybe
> try to make an educated guess according to the script/block.
I think you have a problem with U+302E HANGUL SINGLE
DOT TONE MARK and U+302F HANGUL DOUBLE DOT TONE MARK, contrary to what
I said earlier. They are preposed combining marks with
Grapheme_Extend=Yes and EAW=Wide. I'm not sure whether the (legacy &
extended) grapheme cluster <U+AC00, U+302E> should occupy 2, 3 or 4
cells. I think 2 cells is wrong, so summation works better, contrary
to what I said earlier.
Does anyone know how EAW=Wide was derived for these characters?
Apparently they were wide even when they were non-spacing marks
(gc=Mn), e.g.. in Unicode Version 5.0, so I suspect the were not given
individual consideration. I suspect they should be EAW=A(mbiguous).
More information about the Unicode