Re: ICU's uconv vs Linux iconv and UTF-8

From: Dan Kogai (dankogai@dan.co.jp)
Date: Fri Feb 01 2002 - 13:57:37 EST


Marco,

   Thank you for elaborating my points.

On 2002.02.02, at 01:40, Marco Cimarosti wrote:
> << The entire former contents of this directory are obsolete and have
> been
> moved to the OBSOLETE directory. The latest information may be found
> in the Unihan.txt file in the latest Unicode Character Database.
> August 1, 2001. >>
>
> And don't bother to download the 23 Mb
> <http://www.unicode.org/Public/UNIDATA/Unihan.txt> file, because it
> contains
> only mappings for kanji's.

   Yes. That's the point #0. Unihan.txt is no replacement for
MAPPINGS. Maybe I can come up with a script which generates a table out
of it but this kind of attitude is far from nice.
   And Unihan.txt also lacks 8bit mappings like JISX-0201.

> So, go directly to
> <http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/>, where you
> can
> find the old data, along with a note about mapping errors:

   But this time, they are right about being OBSOLETE.

> Below is some analysis by Asmus Freytag of specific problems raised by
> T.
> Kubota in this document:
> http://www.debian.or.jp/~kubota/unicode-symbols.html

   English version also available as

        http://www.debian.or.jp/~kubota/unicode-symbols.html.en

   And let me quote the part which is significant.

> ASCII and JIS X 0201 Roman
>
> When converting EUC-JP and Shift_JIS, handling of 0x5c and 0x7e can be
> a problem. Since both encodings have long history and Japanese people
> have lot of experience how to handle them, I now introduce it.
>
> Solution is very simple. Just regard YEN SIGN and REVERSE SOLIDUS as a
> different glyphs of the same character. Then, distinction between ASCII
> and JIS X 0201 Roman can be neglected.

   Has anyone of Unicode Consortium seen this one?

Dan



This archive was generated by hypermail 2.1.2 : Fri Feb 01 2002 - 15:09:06 EST