Re: Taiwan Aboriginal Languages and Unicode support

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Dec 26 2006 - 08:02:33 CST

Next message: Werner LEMBERG: "Re: U+3401"

Previous message: Philippe Verdy: "Re: Taiwan Aboriginal Languages and Unicode support"
In reply to: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Next in thread: Cristian Secară: "Re: Taiwan Aboriginal Languages and Unicode support"
Reply: Cristian Secară: "Re: Taiwan Aboriginal Languages and Unicode support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Arne Götje (高盛華) <arne at linux dot org dot tw> wrote:

>> See the often-cited examples of "ch" in Spanish and Czech. The fact
>> that two existing characters combine to make a single "letter" in an
>> orthography does not justify encoding the combination as a separate
>> character. Most of the existing examples where this was done in
>> Unicode were to achieve some 1-to-1 convertibility goal in Unicode
>> 1.0, and do not represent a precedent for future encoding.
>
> no, this is not the same. the 'ġ' letter does not exist in the
> alphabet, but 'nġ' is a separate letter an has to be treated as such.
> For example: when searching for 'n' in a document it is *not*
> appropriate that 'nġ' shows up.
> Also when typing and deleting the 'nġ' letter, it has to be removed as
> a whole.
> For sorting issues: it is *not* appropriate for 'nġ' to be sorted
> after 'n'. See the links I posted earlier.
>
> So, this is clearly *not* a combination of two existing letters, but a
> letter on its own.

You and I are both correct: the *letter* "nġ" in Amis and Paiwan
consists of the two *Unicode characters* U+006E and U+0121. There is
not necessarily a 1-to-1 correspondence between "Unicode characters" and
"letters in the alphabet used by a particular language."

All of the issues you described that involve searching, sorting, and
user interface can be implemented without encoding "nġ" as a separate
character.

> again: they are *not* two base letter but one 'nġ', where the dot gets
> replaced with the accent. Same issue like the 'i' in European
> languages.

How do users in Amis and Paiwan type this letter on a typewriter or
computer keyboard?

>> This is what Lithuanian does, IIRC.
>
> If it should be this way, then I propose that all software shall be
> changed in the way, that when a base glyph has one ore more combining
> accents, the whole sequence shall be treated as *one* character, so,
> when deleting a combining accent all preceding characters up to the
> base character and following combining accents, which belong to the
> same sequence get deleted too.

That is already how proper Unicode-enabled software is supposed to work.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

Next message: Werner LEMBERG: "Re: U+3401"
Previous message: Philippe Verdy: "Re: Taiwan Aboriginal Languages and Unicode support"
In reply to: Arne Götje (高盛華): "Re: Taiwan Aboriginal Languages and Unicode support"
Next in thread: Cristian Secară: "Re: Taiwan Aboriginal Languages and Unicode support"
Reply: Cristian Secară: "Re: Taiwan Aboriginal Languages and Unicode support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 26 2006 - 08:04:40 CST