Re: ICU's uconv vs Linux iconv and UTF-8

From: Dan Kogai (dankogai@dan.co.jp)
Date: Fri Feb 01 2002 - 10:33:53 EST

Previous message: Dan Kogai: "Re: ICU's uconv vs Linux iconv and UTF-8"
In reply to: Mark Leisher: "Re: ICU's uconv vs Linux iconv and UTF-8"
Next in thread: Dan Kogai: "Re: ICU's uconv vs Linux iconv and UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2002.02.01, at 23:57, Mark Leisher wrote:
> Dan> FYI I have reported this brain-dead mapping problem to Unicode
> Dan> Consortium but never got an answer. Well, they are not public
> Dan> society in a way they charge for the membership to say
> anything. One
> Dan> of the reasons so many Japanese love to hate Unicode...
>
> This kind of false information is why many Japanese continue to love to
> hate
> Unicode. If you were actually on the Unicode mailing list, you
> wouldn't be
> repeating garbage like this.
>
> Sign up and send a message about the mapping tables. You will get an
> answer.

I have signed up to unicode@unicode.org a long ago and I thought I did
since I am still getting invitation to conferences and such. But I
checked lister@unicode.org and it did subscribe my address again instead
of getting an error message saying I have already subscribed. Hmm....
Anyway, I have resubscribed so here I go....
Okay. Here is. let me begin with the original message. Sorry for
repetition, folks in perl-unicode@perl.org.

> On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
>> As part of the mystery of CJK encodings I notice that IBM's ICU's uconv
>> and SuSE6.4 linux iconv differ as to the UTF-8 representation if
>> table.euc
>>
>> Both converters will round-trip with themselves and give byte exact
>> copy of table.euc
>>
>> Weirdly they differ in how they map '\' and '~' in ASCII space as
>> well as some spots in higher characters.
>
> Oh, yes. This is the problem of the original Unicode 2.x map; It is
> not ASCII preservative. I have posted this problem to perl-
> unicode@perl.org when I first released Jcode. Several discussions
> later, I made Jcode so that it preserves ASCII by default and added
> $Jcode::Unicode::PEDANTIC to change the behavior
> Here is the exerpt from Jcode::Unicode
>
> VARIABLES
> $Jcode::Unicode::PEDANTIC
> When set to non-zero, x-to-unicode conversion becomes
> pedantic. That is, '\' (chr(0x5c)) is converted to
> zenkaku backslash and '~" (chr(0x7e)) to JIS-x0212
> tilde.
>
> By Default, Jcode::Unicode leaves ascii ([0x00-0x7f])
> as it is.
>
>> Linux iconv will not take ICU's UTF-8.
>> ICU's uconv will read the iconv output but does produce same as
>> original
>> table.euc.
>
> So far as I see Linux iconv is ascii-preservative while ICS's is
> Unicode-strict.
> From Perl's point of view ASCII preservative should be default.
> FYI I have reported this brain-dead mapping problem to Unicode
> Consortium but never got an answer. Well, they are not public society
> in a way they charge for the membership to say anything. One of the
> reasons so many Japanese love to hate Unicode...
>
>> Our current euc-jp.ucm is compatible with Linux iconv.
>
> Right choice.
>
> Dan the Man with So Many Charsets to Deal With

Now let me repeat the same question I have asked a long ago. Why is
the Unicode - JISX2xxx map remains so that it does not preserve ASCII
part? Despite the fact most converters ignores the original map and
leaves ASCII part as is?
One more question. Where has the contents in
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ gone?

_____ Dan Kogai
   __/ ____ CEO, DAN co. ltd.
  /__ /-+-/ 2-8-14-418 Shiomi Koto-ku Tokyo 135-0052 Japan
    /--/--- mailto: dankogai@dan.co.jp / http://www.dan.co.jp/ ---------
__/ / Tel:+81 3-5665-6131 Fax:+81 3-5665-6132
          GPG Key: http://www.dan.co.jp/~dankogai/dankogai.gpg.asc

Previous message: Dan Kogai: "Re: ICU's uconv vs Linux iconv and UTF-8"
In reply to: Mark Leisher: "Re: ICU's uconv vs Linux iconv and UTF-8"
Next in thread: Dan Kogai: "Re: ICU's uconv vs Linux iconv and UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Feb 01 2002 - 10:03:23 EST