RE: NFD on u+AC00 contradicts NormalisationData.txt ?

From: Kent Karlsson ([email protected])
Date: Wed Jun 14 2006 - 18:14:59 CDT

Next message: Asmus Freytag: "Re: Some questions about Latin diacritics"

Previous message: David Faulks: "Unicode 5.1"
In reply to: Eric Muller: "Re: NFD on u+AC00 contradicts NormalisationData.txt ?"
Next in thread: Richard Wordingham: "Re: NFD on u+AC00 contradicts NormalisationData.txt ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > But why isn't it listed in UnicodeData.txt?
> >
> Because that would double the size of the file, and because the
> decompositions are algorithmic and are most often implemented

More specifically: they are *arithmetic* even.

> that way (rather than being driven by tables).

But the COULD have been listed in the datafile HangulSyllabeType.txt
without adding a single line to that datafile (just lengthening most
of the lines a little bit)...

AC00; LV; 1100 1162 # Lo HANGUL SYLLABLE GA
...
AC01..AC1B; LVT; AC00 11A8..AC00 11C2 # Lo [27] HANGUL SYLLABLE GAG..HANGUL SYLLABLE GAH
...

Like for all canonical decompositions, there is a maximum of two characters
in the decomposition *mapping*. This is of importance for the *composition*
step in normalisation.

Note that the true letters (in origin) of Hangul are the Hangul Jamos that
consist of only one "element". The two and three "element" Jamos are
composites (though they don't have any formal decomposition in Unicode),
though some of the double ones may now be considered as letters.

(The "HANGUL LETTER"s is another story, which I will not expand on now,
other than that their formal compatibility decompositions aren't useful
in any context.)

Note also that the HANGUL SYLLABLEs, though many, do NOT include
ALL (historic) Hangul syllables.

/kent k

Next message: Asmus Freytag: "Re: Some questions about Latin diacritics"
Previous message: David Faulks: "Unicode 5.1"
In reply to: Eric Muller: "Re: NFD on u+AC00 contradicts NormalisationData.txt ?"
Next in thread: Richard Wordingham: "Re: NFD on u+AC00 contradicts NormalisationData.txt ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 18:24:15 CDT