It looks to me like the "Cp" names might be IBM CCSIDs. For those, have a look at the "ibm-" names in ICU's alias table at http://oss.software.ibm.com/cvs/icu/~checkout~/icu/data/convrtrs.txt
Note that ICU uses "cp" to mean Microsoft codepage numbers.
Note also that even IBM changes some of its tables over time and has in a few dozen cases multiple Unicode<->codepage tables per CCSID (see our entries for ibm-943 and ibm-1363).
"Haphazard" is a good description of the situation...
It is easy to have "repertoires" - the hard part is to have "one repertoire". The situation is beyond repair, although we (ICU) are still collecting and publishing data. Use Unicode, UTFs, SCSU.
markus
Mike Brown wrote:
...
> I should not be surprised by your statement, but I am. It is distressing to
> think that something that by definition should not be rocket science --
> repertoires of abstract characters mapped directly to specific bit patterns
> -- would be subject to such haphazard definition and even more haphazard
> implementation.
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT