From: Doug Ewell (dewell@adelphia.net)
Date: Tue Dec 26 2006 - 08:02:33 CST
Arne Götje (高盛華) <arne at linux dot org dot tw> wrote:
>> See the often-cited examples of "ch" in Spanish and Czech. The fact
>> that two existing characters combine to make a single "letter" in an
>> orthography does not justify encoding the combination as a separate
>> character. Most of the existing examples where this was done in
>> Unicode were to achieve some 1-to-1 convertibility goal in Unicode
>> 1.0, and do not represent a precedent for future encoding.
>
> no, this is not the same. the 'ġ' letter does not exist in the
> alphabet, but 'nġ' is a separate letter an has to be treated as such.
> For example: when searching for 'n' in a document it is *not*
> appropriate that 'nġ' shows up.
> Also when typing and deleting the 'nġ' letter, it has to be removed as
> a whole.
> For sorting issues: it is *not* appropriate for 'nġ' to be sorted
> after 'n'. See the links I posted earlier.
>
> So, this is clearly *not* a combination of two existing letters, but a
> letter on its own.
You and I are both correct: the *letter* "nġ" in Amis and Paiwan
consists of the two *Unicode characters* U+006E and U+0121. There is
not necessarily a 1-to-1 correspondence between "Unicode characters" and
"letters in the alphabet used by a particular language."
All of the issues you described that involve searching, sorting, and
user interface can be implemented without encoding "nġ" as a separate
character.
> again: they are *not* two base letter but one 'nġ', where the dot gets
> replaced with the accent. Same issue like the 'i' in European
> languages.
How do users in Amis and Paiwan type this letter on a typewriter or
computer keyboard?
>> This is what Lithuanian does, IIRC.
>
> If it should be this way, then I propose that all software shall be
> changed in the way, that when a base glyph has one ore more combining
> accents, the whole sequence shall be treated as *one* character, so,
> when deleting a combining accent all preceding characters up to the
> base character and following combining accents, which belong to the
> same sequence get deleted too.
That is already how proper Unicode-enabled software is supposed to work.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Tue Dec 26 2006 - 08:04:40 CST