RE: ICU's uconv vs Linux iconv and UTF-8

From: Yves Arrouye (yves@realnames.com)
Date: Fri Feb 01 2002 - 13:28:18 EST


>> As part of the mystery of CJK encodings I notice that IBM's ICU's
>> uconv and SuSE6.4 linux iconv differ as to the UTF-8 representation
>> if table.euc
>>
>> Both converters will round-trip with themselves and give byte exact
>> copy of table.euc
>>
>> Weirdly they differ in how they map '\' and '~' in ASCII space as
>> well as some spots in higher characters.

That is understandable if they use different tables. The question is which
one is the "right" EUC-JP, and which one do users want? ICU, as well as
iconv, could have two tables with the different mappings. The question then
is how to label them, and whether the labeling should be compatible between
the two.

>> Linux iconv will not take ICU's UTF-8.
>> ICU's uconv will read the iconv output but does produce same as
>> original
>> table.euc.

I find the same statement confusing. Are you saying that uconv's UTF-8 is
ill-formed? Nick, Would you mind email me (and just me, not the list) your
table.euc sample file?

Thanks,
YA



This archive was generated by hypermail 2.1.2 : Fri Feb 01 2002 - 13:01:03 EST