Mapping tables available

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Wed Aug 14 1996 - 12:41:56 EDT


I am making available mapping tables we use for conversion between Unicode 1.1
and various character sets/transliterations. The archive is available as:

  ftp://crl.nmsu.edu/CLR/multiling/character-sets/csets.tar.gz

Much of this data originated from the data available from:

  ftp://unicode.org/pub/MappingTables/

Some rearrangement and simplification has been done to make the data a little
more consistent and easier to manage through software. Please note that the
tables are not necessarily one-to-one mappings!

If there are enough requests, I will extract the tables so they can be
retrieved individually. There are 72 mapping tables contained in the
archive in plain ASCII format and Unix line separators (Ctrl-j).

A number of other mapping tables are in progress (e.g. more Russian mappings,
mappings from ftp://prep.ai.mit.edu/pub/gnu/recode-3.4.tar.gz, and mappings
from "tcs").

Please feel free to alert me about problems with the tables or anything else
about the distribution.

Contributions of other character set and transliteration data always welcome!

The current mapping tables are:

Filename CS Name Short Description
-----------------------------------------------
8859-1.cset iso8859-1 Western European
8859-2.cset iso8859-2 Slavic and Central European
8859-3.cset iso8859-3 Esperanto, Galician, Maltese, and Turkish
8859-4.cset iso8859-4 Estonian, Latvian, and Lithuanian
8859-5.cset iso8859-5 Cyrillic
8859-6.cset iso8859-6 Arabic
8859-7.cset iso8859-7 Greek
8859-8.cset iso8859-8 Hebrew
8859-9.cset iso8859-9 Other characters for Turkish
armscii.cset armscii Armenian
big5.cset big5 Big5 Traditional Chinese
cp1250.cset cp1250 Superset of iso8859-2 (Latin 2)
cp1251.cset cp1251 Superset of iso8859-5 (Cyrillic)
cp1252.cset cp1252 Superset of iso8859-1 (Latin 1)
cp1253.cset cp1253 Superset of iso8859-7 (Greek)
cp1254.cset cp1254 Superset of iso8859-9 (Latin 5)
cp1255.cset cp1255 Superset of iso8859-8 (Hebrew)
cp1256.cset cp1256 Superset of iso8859-6 (Arabic)
cp1257.cset cp1257 Superset of iso8859-4 (Latin 4)
cp1258.cset cp1258 Windows Vietnamese
cp437.cset cp437 DOS LatinUS encoding
cp737.cset cp737 DOS Greek encoding
cp775.cset cp775 DOS BaltRim encoding
cp850.cset cp850 DOS Latin1 encoding
cp852.cset cp852 DOS Latin2 encoding
cp855.cset cp855 DOS Cyrillic encoding
cp857.cset cp857 DOS Turkish encoding
cp860.cset cp860 DOS Portuguese encoding
cp861.cset cp861 DOS Icelandic encoding
cp862.cset cp862 DOS Hebrew encoding
cp863.cset cp863 DOS CanadaFrench encoding
cp865.cset cp865 DOS Nordic encoding
cp866.cset cp866 DOS CyrillicRussian encoding
cp869.cset cp869 DOS Greek2 encoding
cp874.cset cp874 DOS Thai encoding
ethiopic.cset ethiopic CRL/NMSU Ethiopic EUC encoding
gb12345.cset gb12345 Extended Simplified Chinese
gb2312.cset gb2312 Simplified Chinese
ibm037.cset cp037 IBM US/Canada EBCDIC CP037 encoding
ibm1026.cset cp1026 IBM Turkish EBCDIC CP1026 encoding
ibm500.cset cp500 IBM International EBCDIC CP500 encoding
ibm875.cset cp875 IBM Greek EBCDIC CP875 encoding
isiri3342.cset isiri3342 Persian + 8859-6 extensions
isoir111.cset isoir111 KOI8-based extended Cyrillic
jisx0201.cset jisx0201 Katakana
jisx0208.cset jisx0208 Japanese
jisx0212.cset jisx0212 Extended Japanese
koi8.cset koi8 KOI8 with CRL/NMSU extensions for SerboCroat
ksc5601-87.cset ksc5601 Korean
lao.cset lao CRL/NMSU Lao 8-bit encoding
macarab.cset mac_arabic Apple Macintosh Arabic encoding
maccroat.cset mac_croatian Apple Macintosh Croatian encoding
maccyr.cset mac_cyrillic Apple Macintosh Cyrillic (Microsoft cp10007) encoding
macgreek.cset mac_greek Apple Macintosh Greek (Microsoft cp10006) encoding
machebr.cset mac_hebrew Apple Macintosh Hebrew encoding
macice.cset mac_icelandic Apple Macintosh Icelandic (Microsoft cp10079) encoding
maclat2.cset mac_latin2 Apple Macintosh Central European (Microsoft cp10029) encoding
macroman.cset mac_roman Apple Macintosh Roman (Microsoft cp10000) encoding
macruman.cset mac_romanian Apple Macintosh Romanian encoding
macthai.cset mac_thai Apple Macintosh Thai encoding
macturk.cset mac_turkish Apple Macintosh Turkish (Microsoft cp10081) encoding
macukr.cset mac_thai Apple Macintosh Ukrainian encoding
nbsc.cset nbsc Partial NotaBene Latin SerboCroat encoding
nbyte.cset nbyte Old N-byte Hangul encoding/transliteration
shift-gb.cset sgb Old Shift-GuoBiao Simplified Chinese
shift-jis.cset sjis Shift-JIS Japanese
tis620.cset tis620 Thai
trigem.cset trigem Old Trigem (Microsoft) Hangul encoding
viqri.cset viqri Vietnamese Quoted Readable-Implicit encoding/transliteration
viscii.cset viscii Viet-STD VISCII 1.1
vscii-1.cset vscii-1 TCVN Vietnamese 1
vscii-2.cset vscii-2 TCVN Vietnamese 2
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT