Re: GBK, HZ and EUC-TW - Unicode round-tripping policy

From: Tom Emerson (tree@basistech.com)
Date: Thu Jan 11 2001 - 09:43:28 EST


Michael (michka) Kaplan writes:
[...]
> As for (example) the case where there are two Euros that are the same, it is
> simple to simply choose one of them and always map it.

But then you loose round trip behavior, which is necessary in some
applications. In cases like this I (and others, e.g., Microsoft) map
one of the ambiguous code-points to the PUA: which allows you to round
trip internally.

Of course if you are unconcerned with maintaining round-trip behavior
(e.g., you just want to convert the text to Unicode so you can
display/edit it), then you map both legacy code points to the same
Unicode codepoint and be done with it.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT