From: Arne Götje (高盛華) (arne@linux.org.tw)
Date: Mon Dec 25 2006 - 21:25:48 CST
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi list,
I just returned from a trip to visit some of the local Taiwan aboriginal
tribes to evaluate the alphabets they use and whether or not Unicode
already has support for all of the characters or not and how to input
them. In the current situation they can neither type nor display the
characters correctly, which leads to some crude '^i' or '`d' and such.
So far, we have collected the characters used in the Amis and Paiwan
languages. We will visit the other tribes too to gather information from
them in the near future.
The full character lists are here:
* Amis: http://www.enricozini.org/2006/amis-character-list.html
* Paiwan: http://www.enricozini.org/2006/paiwan-character-list.html
We found two issues so far and I would like to have your advise on how
to deal with them.
The languages use the Latin script, thanks to Christian missionaries.
1. instead of the letter 'g', they use the letter 'nġ'.
This is a separate letter and not a ligature. It gets sorted differently
in Amis and Paiwan languages and when type processing, it needs to be
handled as such.
My idea would be to encode this letter as a seperate character, as it
has its own semantic. We can put it probably into one of the existing
Latin Extensions in Unicode.
2. With the character 'nġ': in Amis this character, like all others, can
get an acute, grave or circumflex accent. While we can use combining
accent sequences to produce such characters, for the 'nġ' the dot on the
g needs to be replaced, similar like it does on the 'i' in European
languages.
I suppose we need to encode a letter 'dotless ng' for this, like we have
with the 'i'.
3. In Amis language the 'i' when it gets its acute, grave or circumflex
accent, it keeps the i-dot in place and the accent gets stacked on top
of the i-dot.
However, fonts handling European scripts will probably take the i-dot
away and replace it with the accent, rather than stacking the accent on
top of it.
Do we need to have a separate encoded 'i' for this different semantic
purpose? Or is there a better way to solve this issue?
I don't really want to publish separate Latin fonts just for the Taiwan
Aboriginal Languages, but rather ask font maintainers to include support
for the currently unsupported accent combinations. That way we can have
more font styles supporting the script.
Any opinions about these issues?
Cheers
Arne
- --
Arne Götje (高盛華) <arne@linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFkJY8bp/QbmhdHowRAuIqAKCFIW3oU9e+hRqFrszsNn/QYBBInACaAjTj
g8PuB1UYjmR26ykIsi/5uIE=
=gfR6
-----END PGP SIGNATURE-----
This archive was generated by hypermail 2.1.5 : Mon Dec 25 2006 - 21:30:03 CST