From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Dec 05 2003 - 17:50:00 EST
On 05/12/2003 14:01, Philippe Verdy wrote:
> ...
>
>It's just a shame that what was considered as equivalent in the Korean
>standards is considered as canonically distinct (and even compatibility
>dictinct) in Unicode. This means that the same exact abstract Korean text
>can have two distinct representation in Unicode and there's no way to match
>these Unicode representations together. And also that whan mapping Korean
>charsets to Unicode, care must be done, before making the mapping, that all
>compound jamaos will be used each time it is possible.
>
>
Agreed.
>If now the text is stored and handled entirely in Unicode without returning
>to the KSC standard, you won't have any other tool than just UCA to collate
>strings (but collation does not produces strings, just collation weights,
>and there's currently no tool to reverse a list of weights back to an
>Unicode string...
>
>...
>
I note the following which is part of the text explaining C10:
> All processes and higher-level protocols are required to abide by C10
> as a minimum.
> However, higher-level protocols may define additional equivalences
> that do not
> constitute modifications under that protocol. For example, a
> higher-level protocol
> may allow a sequence of spaces to be replaced by a single space.
Presumably a higher level protocol could transform Korean text into a
standardised form, doing what (in your opinion and mine at least)
Unicode normalisation ought to have done.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Fri Dec 05 2003 - 18:31:55 EST