Re: EBCDIC Encoding question

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Mon Nov 02 1998 - 09:09:32 EST


Hello,

am 1998-10-26 um 13:19 h hat Julia Oesterle (Unicode) geschrieben:
> Can any EBCDIC people answer this fellows question?

Though I am not one of those "EBCDIC people", I can (as the local guru on
character encodings, and former EBCDIC user).

Am 1998-10-22 um 12:31 h hat Daniel Oppenheimer geschrieben:
> I am especially interested in converting between ASCII and EBCDIC.

Note that ASCII uses 7 bits per character, whilst EBCDIC uses 8 bits.
Hence, the mapping cannot be bijective.

Note also, that ASCII is a particular 7-bit code, viz. ISO 646 IRV,
whilst many vendors, and text-book authors, abuse the term "ASCII"
(or the similar term "ANSI") for a pletora of different encodings:
- MS-DOS abuses the term "ASCII" as a synonym for "text, in whatever
  8-bit code currently is selected via the 'mode' command", usually
  one of the IBM proprietary codes, CP 437 and CP 850;
- MS-Windows abuses the term "ANSI" for its proprietary 8-bit code,
  CP 1252 (and perhaps also for other MS propritary codes, depending
  on the current language setting),
- many internet encoding utilities abuse the term "ASCII" for the
  8-bit code "Latin-1" (ISO 8859-1), or its predecessor, the DEC multi-
  lingual terminal code.

> However, there appears to be more than one kind of EBCDIC.

Actually, there are 11 (or so) different EBCDICs for the Latin-1 character
set, currently supported (the so-called CECPs = "Country-Extende Code Pages",
if I am not mistaken), several other EBCDIC variants for other character
sets, and several hundred legacy EBCDIC variants.

> I am working on an encoding converter.

Before embarking on any serious work concerning EBCDIC, you should obtain
your copy of the latest "CDRA Level 1 Reference" (SC09-1390) and "CDRA
Level 1 Registry" (SC09-1391) from your nearest IBM representative.

> Could someone tell me the difference between EBCDIC 500 and open EBCDIC?

What do you mean by "open EBCDIC"?

You may find the following tables useful:
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT>
     English (US) CECP, also used in Canada, Netherlands, Portugal, Brazil,
     Australia, and New Zealand
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT>
     Belgium, Switzerland, and International CECP
     (this was meant to become "the" international CECP, but this attempt
     has failed; meanwhile, CECP 1046 is the agreed standard)
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT>
     Greek EBCDIC
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP1026.TXT>
     Turkish EBCDIC (Latin-5 set)
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT>
     Windows code for Latin-1 countries (the "ANSI" misnomer)
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT>
     The "classic" IBM PC code -- but see below
  <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT>
     The "international" IBM PC codepage, containing (but not limited to)
     the Latin-1 character set -- but see below
  All mappings in <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/>
    are subject to the correction outlined in
    <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/IBM/README.TXT>.
All of these tables map various code pages to unicode, resulting in a common
descriptive framework for thoes distinct code pages.

Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT