From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Aug 20 2005 - 05:38:10 CDT
From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
> I can understand the gripes about 'level-2' v. 'level-1' implementation,
> though. I find it distinctly irritating that the newly added Tamil
> consonant SHA U+0BB6 won't combine with vowels in Window XP, and seems
> unlikely to unless one buys otherwise unneeded word processing packages.
I understand that too: if one can demonstrate that Tamil is correctly
handled using level-1 only implementation (yes this can be tested using
PUAs) then it will establish the correct processing rules for handling Tamil
the way it is encoded for now in Unicode.
So it's up to Unicode to verify that the processing based on the current
standard encoding is consistent with the level-1 implementation based on
PUAs. This could be tested by using a mapping table between the two
representations, and comparing the results between the level-2
implementation with standard Unicode, and level-1 implementation with the
"New Tamil" PUA block...
But one must also verify that this will be consistent with the Indian ISCII
standard for Tamil... (there may be a few quirks for exceptional cases
normally absent of humane language, so it won't matter there.
Another option would be to develop a "New Tamil" charset for test, and
establishing a mapping table with ISCII (this will not require allocating
PUAs). When this works, one can then define the correct mapping table
between "New Tamil" and standard Unicode (without using PUAs!).
Although I don't like the idea of publishing new 8-bit charset standards, it
certainly helps when it allows reducing the number of cases to test and
support for supporting correctly a script or language.
This archive was generated by hypermail 2.1.5 : Sat Aug 20 2005 - 05:40:11 CDT