From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Jan 14 2004 - 07:33:34 EST
On 13/01/2004 18:05, D. Starner wrote:
>Peter Kirk writes:
>
>
>>I agree that heuristics should be adjusted for Thai. But problems may
>>arise if they have to be adjusted individually, and without regression
>>errors, for all 6000+ world languages.
>>
>>
>
>Thai is hard because of the writing system. But most writing systems weren't
>encoded pre-Unicode, so if they were typed into a computer, it was with
>a Latin (or Cyrillic?) transliteration that probably used spaces and new lines,
>and in fact was probably ASCII.
>
>More cynically, those who use obscure character sets or font encodings have
>trouble viewing them; that is one of the reasons for Unicode. That this tool
>may to some extent be an example of that problem is a simple fact of life,
>and doesn't call for it to be thrown out.
>
>
Either you are confused or I am. I was not referring to pre-Unicode
legacy encodings. I was referring to Unicode plain text data which may
(when Unicode includes all the necessary characters) be in any one of
6000+ languages, some of which have a variety of scripts and spelling
conventions. The problem is not that people are using obscure legacy
encodings, but that they are not defining their UTF adequately.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Wed Jan 14 2004 - 08:22:06 EST