Asmus Freytag wrote:
> At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote:
>> There're some mistakes in Unicode char.
>> properties for Thai char. and you have to "code around" that.
> And the mistakes are?
I've discussed a few of them here in this list. I'll write
a more formal report on the issue later. Here're some titles
Problems from Unicode properties
- error in combining class of vowel signs make normalization worthless
in some cases. This is important if you want to compare strings.
- decomposition of SARA AM add more problem to normalization
- some properties make grapheme cluster for Thai
imcompatible with the way Thai expect, e.g PINTHU as
virama, SARA AM not a combining character
Inaccuracy in the Unicode book
- backspace 'always' use the same (grapheme cluster) character boundary
as Del and left/right arrow. Actually Thai use backspace to delete single
character not the whole cluster. So character boundary for backspace
should be locale specific.
- in Thai, zero width space is said to be able to expand in full-justified
paragraph. Actually it is always zero width.
These are things you have to khow after learning the Unicode standard
if you plan to work with Thai language, to 'code around' the problem
to make it acceptable for Thai people.
I plan to write a formal report on the issue, not to change the standard,
but to note what is wrong and what have to be code around. So people
who like to work with Thai language (like you) will know the right thing
to do and not repeat the same mistake as in some softwares.
-- Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html
This archive was generated by hypermail 2.1.2 : Tue Jul 16 2002 - 08:36:19 EDT