RE: NFD on u+AC00 contradicts NormalisationData.txt ?

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Wed Jun 14 2006 - 18:14:59 CDT

  • Next message: Asmus Freytag: "Re: Some questions about Latin diacritics"

    > > But why isn't it listed in UnicodeData.txt?
    > >
    > Because that would double the size of the file, and because the
    > decompositions are algorithmic and are most often implemented

    More specifically: they are *arithmetic* even.

    > that way (rather than being driven by tables).

    But the COULD have been listed in the datafile HangulSyllabeType.txt
    without adding a single line to that datafile (just lengthening most
    of the lines a little bit)...

    AC00; LV; 1100 1162 # Lo HANGUL SYLLABLE GA
    ...
    AC01..AC1B; LVT; AC00 11A8..AC00 11C2 # Lo [27] HANGUL SYLLABLE GAG..HANGUL SYLLABLE GAH
    ...

    Like for all canonical decompositions, there is a maximum of two characters
    in the decomposition *mapping*. This is of importance for the *composition*
    step in normalisation.

    Note that the true letters (in origin) of Hangul are the Hangul Jamos that
    consist of only one "element". The two and three "element" Jamos are
    composites (though they don't have any formal decomposition in Unicode),
    though some of the double ones may now be considered as letters.

    (The "HANGUL LETTER"s is another story, which I will not expand on now,
    other than that their formal compatibility decompositions aren't useful
    in any context.)

    Note also that the HANGUL SYLLABLEs, though many, do NOT include
    ALL (historic) Hangul syllables.

            /kent k



    This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 18:24:15 CDT