Re: Sun's Java encodings vs IANA's character set registry

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Apr 13 2001 - 14:32:16 EDT

Next message: Peter_Constable@sil.org: "Re: benefits of unicode"
Previous message: Roozbeh Pournader: "Re: Unicode Collation Algorithm"
In reply to: Mike Brown: "RE: Sun's Java encodings vs IANA's character set registry"
Next in thread: Keld Jørn Simonsen: "Re: Sun's Java encodings vs IANA's character set registry"
Reply: Keld Jørn Simonsen: "Re: Sun's Java encodings vs IANA's character set registry"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

It looks to me like the "Cp" names might be IBM CCSIDs. For those, have a look at the "ibm-" names in ICU's alias table at http://oss.software.ibm.com/cvs/icu/~checkout~/icu/data/convrtrs.txt

Note that ICU uses "cp" to mean Microsoft codepage numbers.

Note also that even IBM changes some of its tables over time and has in a few dozen cases multiple Unicode<->codepage tables per CCSID (see our entries for ibm-943 and ibm-1363).

"Haphazard" is a good description of the situation...
It is easy to have "repertoires" - the hard part is to have "one repertoire". The situation is beyond repair, although we (ICU) are still collecting and publishing data. Use Unicode, UTFs, SCSU.

markus

Mike Brown wrote:
...
> I should not be surprised by your statement, but I am. It is distressing to
> think that something that by definition should not be rocket science --
> repertoires of abstract characters mapped directly to specific bit patterns
> -- would be subject to such haphazard definition and even more haphazard
> implementation.

Next message: Peter_Constable@sil.org: "Re: benefits of unicode"
Previous message: Roozbeh Pournader: "Re: Unicode Collation Algorithm"
In reply to: Mike Brown: "RE: Sun's Java encodings vs IANA's character set registry"
Next in thread: Keld Jørn Simonsen: "Re: Sun's Java encodings vs IANA's character set registry"
Reply: Keld Jørn Simonsen: "Re: Sun's Java encodings vs IANA's character set registry"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT