---------------------- Information from the mail header -----------------------
Sender:       "X3L2, Codes and Character Set Committee" <X3L2@JHUVM.BITNET>
Poster:       Tim Greenwood 603-881-0575 ZKO2-3/Q18 19-Aug-1994 1159
              <greenwood@R2ME2.ENET.DEC.COM>
Subject:      RFC 1345 - POSIX involvement in Keld's character mnemonic scheme?
-------------------------------------------------------------------------------

The attached, rather old, memo from Keld was recently brought to my attention.
It attempts a short naming scheme for all non ideographic and non Hanguel
characters in 10646, and uses this to correlate characters in many registered
coded character sets.

Keld mentions that the character mnemonics are taken from the ISO committe
draft of POSIX.2

I have seen these two character abbrevations before in memos from Keld and
Alain, but was not aware that POSIX was involved.

Does anyone know what, if anything, POSIX is doing in this field?

The actual memo is 102 pages long so I am sending an abridged version. I have
edited out most of the tables. I can send the full thing to the list or
individually if you wish.

Tim


Network Working Group                                        K. Simonsen
Request for Comments: 1345                   Rationel Almen Planlaegning
                                                               June 1992


                  Character Mnemonics & Character Sets

Status of the Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard.  Distribution of this memo is
   unlimited.

Summary

   This memo lists a selection of characters and their presence in some
   coded character sets. To facilitate the coded character set
   tabulations an unambiguous mnemonic for each character is used, and a
   format for tabulating the coded character sets is defined. The coded
   character sets are given names for easy reference. A family of coded
   character sets called the mnemonic character sets and conversion
   between these coded character set without information loss is
   defined.

   The character set names are registered with the Internet Assigned
   Numbers Authority (IANA).  Additional character sets not described in
   this memo should be registered with the IANA. This memo may be
   updated periodically, or additional specifications may be published,
   to reflect other coded character sets.

   Please send any comments including comments about the accuracy of the
   tables to the author, Keld Simonsen <Keld.Simonsen@dkuug.dk>.

1.  INTRODUCTION

   With the growing internationalization of the Internet, support for
   many coded character sets is required. It is the intention of this
   memo to document precisely the mapping between all characters and
   their corresponding coded representations in various coded character
   sets, and give names to these coded character sets, so they can be
   referenced unambiguously in Internet standards.

   This memo does not indicate anything about the validity of using
   these specifications in any Internet standard, so you should consult
   each individual Internet standard to see which coded character sets
   and names are allowed there.

   Unambiguous character mnemonics are specified, which provide a
   practical way of identifying a character, without reference to a
   coded character set and its code in this coded character set.  The
   mnemonics are written in a minimal set of characters, namely the
   invariant 83 graphical characters of ISO 646, which is a kind of
   greatest common subset to be found between the majority of coded

Simonsen                                                        [Page 1]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   character sets, including ASCII, national variants of the ISO 646 7-
   bit character set and various EBCDICs.  In addition, the numeric
   value of the coded representations of all these characters are the
   same in all coded character sets compatible with ISO standards.  All
   of them except two, EXCLAMATION MARK and QUOTATION MARK, have the
   same coded representation in all variants of EBCDIC.  This minimal
   set of characters is called the reference character set in this memo.

   The mnemonics can be used in Internet standards for easy and
   unambiguous reference, and they can also serve as a fallback
   representation in various Internet specifications.

   The coded character sets covered include all parts of ISO 8859, ISO
   6937-2 and all ISO 646 conforming coded character sets in the ISO
   character set registry managed by ECMA according to ISO 2375.  Almost
   all graphic coded character sets in the ECMA registry (1) are
   covered.  The graphic coded character sets not included are registry
   numbers 31, 38, 39, 53, 59, 68, 71, 72, 129 and 137.  In addition
   many vendor defined character sets are covered, including PC
   codepages (4), (7), (8), many EBCDIC character sets (4), (5), (6) and
   HP, DEC and Apple character sets (8), (9), (10), (13), (14).  The
   East-Asian 16-bit character sets from the ECMA registry is also
   included in this memo.

2.  CHARACTER MNEMONICS

2.1  General Syntax

   The character mnemonics are taken from the ISO committee draft (CD)
   of the POSIX.2 standard (3).  They are classified into two groups:


   1. A group with two-character mnemonics
      - Primarily intended for alphabetic scripts like Latin, Greek,
        Cyrillic, Hebrew and Arabic, and special characters.
   2. A group with variable-length mnemonics
      - primarily intended for non-alphabetic scripts like Japanese and
        Chinese, but also used for some accented letters and special
        characters.

   In the two-character mnemonics, all invariant graphic character in
   the ISO 646 character codes except "&" are used, i.e. the following
   characters:

           ! "     %   ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
             A B C D E F G H I J K L M N O P Q R S T U V W X Y Z       _
             a b c d e f g h i j k l m n o p q r s t u v w x y z

   The character "_" is not used as the first character.

   In the variable-length mnemonics, the character "_" is not  used as
   the first character. If it is used in a name, its presence is
   doubled.

Simonsen                                                        [Page 2]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   The mnemonics can be used in several different ways for different
   purposes.  One of these is description of coded character sets, which
   is detailed in section 3.  Another is for extending a given coded
   character set to a mnemonic character set.  This is described in
   section 4.  The restrictions on the use of the characters "&" and "_"
   are due to demands of the compositional methods of these techniques.

2.2  ISO Official Long Descriptive Character Name

   For all mnemonics, the character for which it stands is indicated in
   the following table by a long descriptive name.  This name is
   identical to the ISO name of the character as given in reference (2).
   For a few characters that are not included there, descriptive names
   of the same kind are introduced in this memo.  The source of each
   character is stated in the table after the name and should be
   consulted for a reliable identification of the character.

   These long descriptive names consists only of the capital Latin
   letters of the invariant part of ISO 646, the digits, "-", and SPACE.
   Digits are only used in names of ideographic and Hangul characters
   and never as the first character.

2.3  The 2-character Mnemonics

   The two-character mnemonics include various accented Latin letters,
   Greek, Cyrillic, Hebrew, Arabic, Hiragana and Katakana.  Also a fair
   number of special characters are included.  Almost all ISO or ISO
   registered 7- and 8-bit graphical coded character sets are covered
   with these two-character mnemonics.

   The two characters are chosen so the graphical appearance in the
   reference set resembles as much as possible (within the possibilities
   available) the graphical appearance of the character. The basic
   character set of ISO 646 is used as the reference set, as mentioned
   above.

   The characters in the reference character set are chosen to represent
   themselves.

   For control characters from ISO 646 the two-character acronyms of ISO
   2047 are used as mnemonics.  For the other control characters of ISO
   6429, two-character mnemonics have been selected based on the
   variable-length acronyms used in that standard.

   Letters, including Greek, Cyrillic, Arabic and Hebrew, are
   represented with the base letter as the first letter, and the second
   letter represents an accent or relation to a non-Latin script.  Non-
   Latin letters are transliterated to Latin letters, following
   transliteration standards as closely as possible.  This is also done
   with the Latin letters such as ETH and THORN, and the
   Danish/Norwegian/Swedish letter A WITH RING ABOVE is transliterated
   into "aa".


Simonsen                                                        [Page 3]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   After a letter, the second character signifies the following:

     Exclamation mark           ! Grave
     Apostrophe                 ' Acute accent
     Greater-Than sign          > Circumflex accent
     Question Mark              ? tilde
     Hyphen-Minus               - Macron
     Left parenthesis           ( Breve
     Full Stop                  . Dot Above
     Colon                      : Diaeresis
     Comma                      , Cedilla
     Underline                  _ Underline
     Solidus                    / Stroke
     Quotation mark             " Double acute accent
     Semicolon                  ; Ogonek
     Less-Than sign             < Caron
     Zero                       0 Ring above
     Two                        2 Hook
     Nine                       9 Horn

     Equals                     = Cyrillic
     Asterisk                   * Greek
     Percent sign               % Greek/Cyrillic special
     Plus                       + smalls: Arabic, capitals: Hebrew
     Three                      3 some Latin/Greek/Cyrillic letters
     Four                       4 Bopomofo
     Five                       5 Hiragana
     Six                        6 Katakana

   In designing the mnemonics the following special characters were
   reserved: The ampersand is reserved as an intro character, indicating
   that the following string is in the mnemonic character set.  The
   underline character is reserved for the variable-length mnemonics.
   This use does not eliminate usage as an accent or language
   identifier.

   Special characters are encoded with some mnemonic value.  These are
   not systematic thruout, but most mnemonics start with a related
   special character of the reference set.

2.4  The Variable-length Character Mnemonics

   The Variable-length Character Mnemonics are primarily meant for the
   ideographic characters in larger Asian character sets, but are also
   used for accented characters with several accents and some special
   characters. To have the mnemonics as short as possible, which both
   saves storage and is easier to input, a quite short name is
   preferred. Considering the Chinese standard GB 2312-1980, the
   Japanese standards JIS X0208 and JIS X0212, and the Korean standard
   KS C 5601, they are all given by row and column numbers between 1 and
   94. So two positions for row and column and a character set
   identifier of one character would be almost as short as possible.
   The following character set identifiers are defined:

Simonsen                                                        [Page 4]

RFC 1345          Character Mnemonics & Character Sets         June 1992


            c   GB 2312-1980
            j   JIS X0208-1990
            J   JIS X0212-1990
            k   KS C 5601-1987

   This system for the representation of ideographic characters and
   Hangul characters is not truly mnemonic, but it provides short
   representations that are easy to connect to the corresponding
   character by means of the code table of an official character set
   standard. Alternative methods based on the graphic appearance or the
   pronunciation of the characters are thought to be unfeasible.

   One prominent character in the reference character set is reserved
   for identifying variable-length mnemonics, namely the underline
   character "_". This character is intended as a delimiter both in the
   front and in the end of the mnemonic. An example of its use would be:
   (&=intro):

             &_j3210_ &_j4436_&_j6530_

3.  CHARACTER MNEMONIC TABLE

   The following table contains the character mnemonic and the encoding
   and long descriptive name of ISO 2DIS 10646 (2).  Although the ISO
   10646 is only at DIS stage at this moment of writing and there is
   quite some debate about it, the long descriptive naming in the DIS is
   considered to be stable and the best official ISO reference to
   character names. The 2-octet encoded value of the ISO 2DIS 10646 is
   also used, but only as an identification of the character, and it
   should only be used for identification purposes as the coded
   representation may be changed in the final 10646 international
   standard. Some characters not in the ISO 2DIS 10646 are allocated
   values in the private use zone and given names and references to a
   character set where it is used.

   The format of the table is:

   1st field is the character mnemonic (mostly 2 characters).
   2nd field is the ISO 2DIS 10646 code in hexadecimal.
   3rd field is the long descriptive name of ISO 2DIS 10646.

 SP     0020    SPACE
 !      0021    EXCLAMATION MARK
 "      0022    QUOTATION MARK
 Nb     0023    NUMBER SIGN
 DO     0024    DOLLAR SIGN
 %      0025    PERCENT SIGN
 &      0026    AMPERSAND
 '      0027    APOSTROPHE
 (      0028    LEFT PARENTHESIS
 )      0029    RIGHT PARENTHESIS
 *      002a    ASTERISK
 +      002b    PLUS SIGN

Simonsen                                                        [Page 5]

(35 pages deleted)
Simonsen                                                       [Page 40]

RFC 1345          Character Mnemonics & Character Sets         June 1992


                part)
 "'     e007    NON-SPACING ACUTE ACCENT (ISO-IR-103 194) (character
                part)
 ">     e008    NON-SPACING CIRCUMFLEX ACCENT (ISO-IR-103 195)
                (character part)
 "?     e009    NON-SPACING TILDE (ISO-IR-103 196) (character part)
 "-     e00a    NON-SPACING MACRON (ISO-IR-103 197) (character part)
 "(     e00b    NON-SPACING BREVE (ISO-IR-103 198) (character part)
 ".     e00c    NON-SPACING DOT ABOVE (ISO-IR-103 199) (character part)
 ":     e00d    NON-SPACING DIAERESIS (ISO-IR-103 200) (character part)
 "0     e00e    NON-SPACING RING ABOVE (ISO-IR-103 202) (character part)
 ""     e00f    NON-SPACING DOUBLE ACCUTE (ISO-IR-103 204) (character
                part)
 "<     e010    NON-SPACING CARON (ISO-IR-103 206) (character part)
 ",     e011    NON-SPACING CEDILLA (ISO-IR-103 203) (character part)
 ";     e012    NON-SPACING OGONEK (ISO-IR-103 206) (character part)
 "_     e013    NON-SPACING LOW LINE (ISO-IR-103 204) (character
                part)
 "=     e014    NON-SPACING DOUBLE LOW LINE (ISO-IR-38 217) (character
                part)
 "/     e015    NON-SPACING LONG SOLIDUS (ISO-IR-128 201) (character
                part)
 "i     e016    GREEK NON-SPACING IOTA BELOW (ISO-IR-55 39) (character
                part)
 "d     e017    GREEK NON-SPACING DASIA PNEUMATA (ISO-IR-55 38)
                (character part)
 "p     e018    GREEK NON-SPACING PSILI PNEUMATA (ISO-IR-55 37)
                (character part)
 ;;     e019    GREEK DASIA PNEUMATA (ISO-IR-18 92)
 ,,     e01a    GREEK PSILI PNEUMATA (ISO-IR-18 124)
 b3     e01b    GREEK SMALL LETTER MIDDLE BETA (ISO-IR-18 99)
 Ci     e01c    CIRCLE (ISO-IR-83 0294)
 f(     e01d    FUNCTION SIGN (ISO-IR-143 221)
 ed     e01e    LATIN SMALL LETTER EZH (ISO-IR-158 142)
 am     e01f    ANTE MERIDIAM SIGN (ISO-IR-149 0267)
 pm     e020    POST MERIDIAM SIGN (ISO-IR-149 0268)
 Tel    e021    TEL COMPATIBILITY SIGN (ISO-IR-149 0269)
 a+:    e022    ARABIC LETTER ALEF FINAL FORM COMPATIBILITY (IBM868 144)
 Fl     e023    DUTCH GUILDER SIGN (IBM437 159)
 GF     e024    GAMMA FUNCTION SIGN (ISO-10646-1DIS 032/032/037/122)
 >V     e025    RIGHTWARDS VECTOR ABOVE (ISO-10646-1DIS 032/032/038/046)
 !*     e026    GREEK VARIA (ISO-10646-1DIS 032/032/042/164)
 ?*     e027    GREEK PERISPOMENI (ISO-10646-1DIS 032/032/042/165)
 J<     e028    LATIN CAPITAL LETTER J WITH CARON (lowercase:
                000/000/001/240)

4.  CHARSETS

   The character mnemonics hav been used to table a number of coded
   character sets.  The coded character set names are taken if possible
   from the official ISO registration description in the ISO 2375 (ECMA)
   register, or with a number like the code page number - or with an
   indication of the language or country it is being used for - using

Simonsen                                                       [Page 41]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   the country designators of ISO 3166.  For the character sets in the
   ECMA register, their ISO registration number is also given (as ISO-
   IR-xxx). Often the ISO registration number does not cover all the
   codes of a character set in use, but for instance only the graphical
   characters, where another ISO registration number covers the control
   characters; in the case of the 8-bit character sets the ISO
   registration only covers the upper graphical characters (GR).  The
   ISO registration number is here taken to indicate the full coded
   character set including control characters and lower half of the
   graphical characters, normally ISO 6429 and ASCII, respectively.

   The ISO definition of the term "coded character set" is as follows:
   "A set of unambiguous rules that establishes a character set and the
   one-to-one relationship between the characters of the set and their
   coded representation." and this definition may be subject to
   different interpretations.  This memo does not put further
   restrictions on the term of "coded character set" than the following:
   "A coded character set is a set of rules that unambiguously and
   completely determines which sequence of characters, if any, is
   represented by each possible sequence of n-bit bytes for a certain
   value of n." This implies that e.g. a coded character set extended
   with one or more other coded character sets by means of the extension
   techniques of ISO 2022 constitutes a coded character set in its own
   right.  In this memo the term "charset" is used to refer to the above
   interpretation of the ISO term "coded character set".

   A special problem is, if two characters of two different coded
   character sets with the same descriptive name, or depicted by what
   looks like the same graphic symbol, or with the same historical
   origin, really are to be regarded as the same character or not.  This
   problem has been studied in great detail in the development efforts
   that have resulted in ISO DIS 10646 and Unicode (under the heading
   "character unification").  As much as possible such results have been
   used in the construction of the code tables of this section.

4.1  Charset Naming

   The coded character set names are given in ISO 646 invariant subset
   (83 characters, where a space in the name is replaced with an
   underline character; sometimes a hyphen is also used instead of a
   blank, or the blank is eliminated when practice exist).  Case is not
   significant in the charset names.

4.2  Code Table Format

   The following code tables are given in a simple format to facilitate
   use of this text as program input. Programs and routines written in C
   to handle these tables are freely available from the author of this
   memo. Keywords are signified with the character "&" as the first
   character, to distinguish them from ordinary data. Numbers may be
   given in decimal, hexadecimal or octal notation; hexadecimal numbers
   are given with an "x" as the first character, and octal numbers has
   an "o" as the first character.

Simonsen                                                       [Page 42]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   The following keywords are used:

   "&charset" has one parameter defining the name of the character set.
   This is required for every character set.

   "&alias" has one parameter defining a possible alternate name for the
   character set. This is optional.

   "&g0esc", "&g1esc", "&g2esc", "&g3esc", "&c0esc", "&c1esc" has one
   parameter indicating the string of octets used to define the
   character set as the G0, G1, G2, G3, C0 or C1 set respectively,
   according to ISO 2022 (11).  The string is to be preceded by an ESC
   character. It is only the relevant parts of the table, which can be
   used with the definition; the charset is often coded with both
   graphical and control character sets.  If the coded character set is
   a 96-character set, it is tabled with the relevant GL set (normally
   ISO-IR-6) and with ISO 6429 as C0 and C1 (12).  If it is a 94-
   character set, it is tabled with the C0 set of ISO 6429. If it is a
   double-octet coded character set, it is tabled without control
   character sets and accompanying one-octet coded character sets, and
   the two-octet code is tabled as a G0 set.

   "&bits" has one parameter indicating the number of bits to represent
   the charset. This is optional and 8 bits is the default.

   "&code" has one parameter indicating the byte number allocated to the
   following character mnemonic. After the "&code" specification the
   characters are listed with their mnemonic in ascending order.  A
   character mnemonic of "??" indicates that the position is unused.  A
   character mnemonic of "__" indicates that the character set is not
   completely defined with the specifications in this memo.

   "&code2" has 2 parameters specifying the row and column in certain
   16-bit character sets.  The value 32 must be added to obtain the
   first and second byte respectively.  Mnemonics can be specified after
   the "&code2" specification as mentioned for the "&code"
   specification.

   "&codex" has 5 parameters, specifying the character set prefix
   string, the start row number, the end row number, the start column
   number and the end column number respectively. This is equivalent to
   specifying a series of mnemonics of the form "nrrcc" where "n" is the
   character set name prefix string, "rr" is the row number running from
   the specified start row number to the end row number, and "cc" is the
   column number running from the specified start column number to the
   end column number.  The thereby created series mnemonics are
   allocated to code positions which are added 32 to the row and column
   numbers to get the row and column octet.

   "&duplicate" has a special meaning indicating that a position is
   being used for more than one character. This is an ugly convention
   but it is a sad fact of life that same code in one coded character
   set can mean different characters. "&duplicate" takes two parameters

Simonsen                                                       [Page 43]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   - the first is the code to be duplicated, the other is the new
   mnemonic.

   "&rem" is followed by text to explain something in the table to a
   human reader.  All lines in such a remark has to start with this
   keyword.

   "&comb2" specifies a combination of two characters which signifies a
   third character.  All characters in the specification are given by
   their mnemonic.  The two combining characters must be specified
   previously in the code table.  The first combining character is
   specified as the first character after the keyword, and then the
   following pairs of characters are the second combining character and
   the result, respectively.  The specification can be repeated,
   terminated by an occurrence of a keyword.

4.3  Mnemonic charsets

   The following is compatible with current practice on the internet
   within EUnet - the European not-for-profit networking organisation in
   Europe and North Africa currently operating in 24 countries.

   The mnemonic charsets are a family of charsets which have the
   facility that within the relevant parts of the message, encoded in an
   ordinary coded character set, text may have occurrences of the
   following sequence: an intro character sequence, followed by a string
   of characters that represent a character mnemonic, as described
   below.  Similarly, the intro character sequence may be doubled,
   indicating a single occurrence of the respective symbols in decoded
   format.

   Note that many characters within a mnemonic character set may be
   represented in two different ways.  Normally the character itself is
   used, but it is also possible to use the mnemonic allocated to the
   character in a mnemonic sequence.

   In this way all characters with assigned mnemonics can be represented
   without information loss in any character set, which contains the
   invariant ISO 646 characters as a subset.  As a consequence, using a
   mnemonic character set all these characters can be generated
   uniformly on all keyboards and presented uniformly on all terminal
   equipment, whenever the real character is not available.

   Data encoded in a mnemonic charset is intended to be read by the end
   user possibly without further treatment.  If the transport encoding
   and the presentation encoding for the user differ, it is recommended
   that the data be translated into a mnemonic representation in the
   presentation encoding.

   A mnemonic charset is specified with the name
   "mnemonic+charset+intro" where "mnemonic" is written as given and
   "charset" and "intro" is specified as described below. The mnemonic
   charset "mnemonic" is a shorthand for "mnemonic+ascii+38".  The

Simonsen                                                       [Page 44]

RFC 1345          Character Mnemonics & Character Sets         June 1992


   mnemonic charset "mnem" is a shorthand for "mnemonic+ascii+8200".

   It is discouraged to use mnemonics for Chinese characters of either
   Chinese, Japanese or Korean origin, as the probability that the end
   user equipment can deal with the original encoding is very high for
   the intended receiver, and the mnemonics for such Chinese characters
   described in this memo convey very little meaning to humans.

4.3.1  charset

   The charset is given as one of the charset names in this memo and is
   the encoding used for the transport.  It cannot be a mnemonic
   charset.

4.3.2  Intro

   The intro character sequence is given as the decimal value of the
   intro characters in the transport character set. There may be up to
   two characters used in the intro character sequence, and the decimal
   value for two-character intro sequences are then the first character
   value multiplied with 256 to the power of the number of octets used
   in the character set, plus the second character value.  The
   recommended value is 38 for the ampersand (&) character in ASCII.
   Another common value is 29 for the control character "Group
   Separator", or 8200 for "space" followed by "backspace", which may be
   convenient when operating in some environments, and ordinary text is
   not changed.  Only the ampersand character may be chosen as intro
   from the invariant ISO 646 charset, but any character not in the
   invariant ISO 646 character can be used as intro.  The intro
   character sequence is used for introducing character mnemonics when a
   character is not present in the mail transport character set (as
   defined by "charset").  Character mnemonics longer than two
   characters are surrounded by the underline character. The intro
   character sequence is doubled to represent one occurrence of itself.
   Characters in the mail transport character set are normally just
   represented with their encoding, but may also be represented by the
   intro character sequence and the mnemonic encoding.

   If the intro character sequence is specified as 0 (zero), it is
   omitted in the transport, giving a better readably content, but
   eliminating the possibility of reversibility and introducing an
   information loss.  With intro specified as 0, also underline
   characters surrounding mnemonics longer than 2 characters are
   removed.  Mnemonic charsets with the intro specified as zero is
   equivalent to the ordinary charset, e.g. "mnemonic+ascii+0" is
   equivalent to "ascii".

   The intro character can be given in a header "Mnemonic-Intro:" with
   the value given in decimal as noted above in the first parameter.
   This has only meaning if the charset can be deducted by other
   information as specified by the relevant Internet specification.
   This information has precedence over other information on the intro.


Simonsen                                                       [Page 45]

RFC 1345          Character Mnemonics & Character Sets         June 1992


4.3.3  Compatibility

   If applications conforming to this memo interoperate with other
   versions of this memo, and encounter mnemonics that are undefined
   with this memo, they shall leave the mnemonic as it is coded. This
   provides for upward compatibility.

4.3.4  Conversion Between Mnemonic Charsets

   To determine which mnemonic charsets are permitted with the use of an
   Internet specification, please refer to that specification.  It may
   be that only "ASCII" or "INVARIANT" is allowed as the base charset.
   ASCII is the most used character set, while INVARIANT will be very
   robust for traversing gateways, but it will cause trouble for
   (amongst other things) source code for several programming languages.
   The use of other character sets may be limited to agreement between
   the communicating parties. When such an agreement has been achieved,
   a conversion between different mnemonic charsets can be done
   according to the charset tables below, as characters occurring in
   both encodings are just transformed, and characters not existing in
   the receiving coded character set are represented by the intro
   character sequence of the receiving coded character set plus the
   character mnemonic, as described for the intro character sequence.
   The characters forming the mnemonic are translated into the receiving
   code, which must have these characters present.  An undefined
   character in the originating coded character set is transformed into
   the following sequence: the intro character sequence, an underline, a
   question mark character, a "u" (for undefined) and then the
   hexadecimal value of the character with letters in lowercase
   (possibly more than one byte for multibyte character sets) and then a
   terminating underline character.  Headers may need to be changed
   accordingly to reflect such conversion.  The character mnemonic "/c"
   has a special meaning in specifying that a line is to be continued
   even if the next characters are specifying a new line.

5.  CHARSET TABLES

  &charset ISO_646.basic:1983
  &rem source: ECMA registry
  &alias ref
  &code 32
  SP ! " ?? ?? % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
  ?? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ?? ?? ?? ?? _
  ?? a b c d e f g h i j k l m n o p q r s t u v w x y z

  &charset INVARIANT
  &code 0
  NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI
  DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US
  SP ! " ?? ?? % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
  ?? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ?? ?? ?? ?? _
  ?? a b c d e f g h i j k l m n o p q r s t u v w x y z ?? ?? ?? ?? DT


Simonsen                                                       [Page 46]

RFC 1345          Character Mnemonics & Character Sets         June 1992


  &charset ISO_646.irv:1983
  &rem source: ECMA registry
  &alias iso-ir-2
  &alias ir
  &g0esc x2840 &g1esc x2940 &g2esc x2a40 &g3esc x2b40
  &code 0
  NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI
  DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US
  SP ! " Nb Cu % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
  At A B C D E F G H I J K L M N O P Q R S T U V W X Y Z <( // )> '> _
  '! a b c d e f g h i j k l m n o p q r s t u v w x y z (! !! !) '- DT

  &charset BS_4730
  &rem source: ECMA registry
  &alias iso-ir-4
  &alias ISO646-GB
  &g0esc x2841 &g1esc x2941 &g2esc x2a41 &g3esc x2b41
  &alias gb
  &alias uk
  &code 0
  NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI
  DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US
  SP ! " Pd DO % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
  At A B C D E F G H I J K L M N O P Q R S T U V W X Y Z <( // )> '> _
  '! a b c d e f g h i j k l m n o p q r s t u v w x y z (! !! !) '- DT

  &charset ANSI_X3.4-1968
  &rem source: ECMA registry
  &alias iso-ir-6
  &alias ANSI_X3.4-1986
  &alias ISO_646.irv:1991
  &g0esc x2842 &g1esc x2942 &g2esc x2a42 &g3esc x2b42
  &alias ASCII
  &alias ISO646-US
  &alias US-ASCII
  &alias us
  &alias IBM367
  &alias cp367
  &code 0
  NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI
  DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US
  SP ! " Nb DO % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
  At A B C D E F G H I J K L M N O P Q R S T U V W X Y Z <( // )> '> _
  '! a b c d e f g h i j k l m n o p q r s t u v w x y z (! !! !) '? DT

  &charset NATS-SEFI
  &rem source: ECMA registry
  &alias iso-ir-8-1
  &g0esc x2843 &g1esc x2943 &g2esc x2a43 &g3esc x2b43
  &code 0
  NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI
  DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US
  SP ! " Nb DO % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

Simonsen                                                       [Page 47]

(53 pages deleted)

RFC 1345          Character Mnemonics & Character Sets         June 1992


  DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US
  ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
  ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
  SP ?? ?? ?? ?? ?? ?? ?? ?? ?? Ct .  <  (  +  !!
  &  ?? ?? ?? ?? ?? ?? ?? ?? ?? !  DO *  )  ;  NO
  -  /  ?? ?? ?? ?? ?? ?? ?? ?? BB ,  %  _  >  ?
  ?? ?? ?? ?? ?? ?? ?? ?? ?? '! :  Nb At '  =  "
  ?? a  b  c  d  e  f  g  h  i  ?? ?? ?? ?? ?? ??
  ?? j  k  l  m  n  o  p  q  r  ?? ?? ?? ?? ?? ??
  ?? '? s  t  u  v  w  x  y  z  ?? ?? ?? ?? ?? ??
  ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
  (! A  B  C  D  E  F  G  H  I  ?? ?? ?? ?? ?? ??
  !) J  K  L  M  N  O  P  Q  R  ?? ?? ?? ?? ?? ??
  // ?? S  T  U  V  W  X  Y  Z  ?? ?? ?? ?? ?? ??
  0  1  2  3  4  5  6  7  8  9  ?? ?? ?? ?? ?? DT

ACKNOWLEDGEMENTS

   This memo has been produced with a grant from Nordisk Industrifond
   project number 91030.  I thank all of the people in the IETF 822ext
   WG for their constructive discussion and remarks on this memo. People
   from many other circles have also commented on the text and the
   tables. The following is a list of persons that I remember bringing
   forward suggestions that made me change the specifications - my aging
   memory may have forgot even significant contributions, and I
   apologize for that.

           Alain LaBonte'                  Alina Da Cruz
           Anders Samuelsson               Bob Smart
           Cuong Bui                       Dan Oscarsson
           David Crocker                   David Joslin
           Dick Weaver                     Dmitry V. Volodin
           Erik van der Poel               Geir Petersen
           Greg Vaudreuil                  Harald Tveit Alvestrand
           Hugh Tucker                     Isai Scheinberg
           James Do                        Jan-Michael Rynning
           Johan van Wingen                John C. Klensin
           John F. Chandler                Johnny Erikson
           Justin Bur                      Keith Moore
           Kevin Donnelly                  Kim F. Storm
           Marius Olofson                  Masahiro Sekiguchi
           Maurizio Sichera                Michael Patton
           Nandor Horvath                  Nathaniel Borenstein
           Ned Freed                       Neil Katin
           Olle Jaernefors                 Patrick Faeltstroem
           Paul Pomes                      Peter Svanberg
           Philippe-Andre' Prindeville     Randall Atkinson
           Steve Hardcastle-Kille






Simonsen                                                      [Page 101]

RFC 1345          Character Mnemonics & Character Sets         June 1992


REFERENCES

   (1) ISO 2375 registration: "International Register of Coded Character
   Sets to be Used With Escape Sequences", European Computer
   Manufacturers Association (ECMA), Rue du Rhone 114, CH-1204 Geneve,
   Switzerland, December 1990.

   (2) ISO 2DIS 10646, Information Technology - Universal Multiple-Octet
   Coded Character Set (UCS), ISO/IEC JTC1/SC2/WG2 N783 (26 December
   1991).

   (3) ISO/IEC 9945-2.2 CD POSIX Shell and Utilities, informative annex
   F, ISO/IEC JTC1/SC22 N1063 (October 1991).

   (4) IBM National Language Support Reference Manual Volume 2, SE09-
   8002-01 (March 1990).

   (5) IBM 3174 Establishment Controller, Character Set Reference,
   GA27-3831-02 (March 1990).

   (6) IBM 3270 Information Display System Character Set Reference,
   Chapter 10, GA27-2837-9 (April 1987)

   (7) IBM DOS 3.30 Reference (Abridged) 94X9575 (February 1987)

   (8) IBM Keyboard layouts and code pages, Part Number 07G4586 (June
   1991)

   (9) HP LaserJet IIP Printer User's Manual, HP Part No. 33471-90901
   (June 1989)

   (10) Danish Standard DS 2089, Application of ISO 7-bit coded
   character set, UDC 681.3:003.62, February 1974. (withdrawn).

   (11) ISO 2022:1986 Information processing - ISO 7-bit and 8-bit coded
   character sets - Code extension techniques.

   (12) ISO 6429:1988 Information processing - ISO 7-bit and 8-bit coded
   character sets - Control functions for 7-bit and 8-bit coded
   character sets.

   (13) VAX/VMS User's Manual, Order Number: AI-Y517A-TE, April 1986.

   (14) The Unicode Standard Version 1.0 Volume 1, ISBN 0-201-56788-1
   (October 1991).









Simonsen                                                      [Page 102]

RFC 1345          Character Mnemonics & Character Sets         June 1992


Author's Address

   Keld Simonsen
   Rationel Almen Planlaegning
   Sankt Joergens Alle 8
   DK-1615 Koebenhavn V
   Danmark

   Tel: +45 31 22 65 43
   Fax: +45 33 15 85 16

   Email: Keld.Simonsen@dkuug.dk










































Simonsen                                                      [Page 103]