---------------------- Information from the mail header ----------------------- Sender: "X3L2, Codes and Character Set Committee" Poster: Tim Greenwood 603-881-0575 ZKO2-3/Q18 19-Aug-1994 1159 Subject: RFC 1345 - POSIX involvement in Keld's character mnemonic scheme? ------------------------------------------------------------------------------- The attached, rather old, memo from Keld was recently brought to my attention. It attempts a short naming scheme for all non ideographic and non Hanguel characters in 10646, and uses this to correlate characters in many registered coded character sets. Keld mentions that the character mnemonics are taken from the ISO committe draft of POSIX.2 I have seen these two character abbrevations before in memos from Keld and Alain, but was not aware that POSIX was involved. Does anyone know what, if anything, POSIX is doing in this field? The actual memo is 102 pages long so I am sending an abridged version. I have edited out most of the tables. I can send the full thing to the list or individually if you wish. Tim Network Working Group K. Simonsen Request for Comments: 1345 Rationel Almen Planlaegning June 1992 Character Mnemonics & Character Sets Status of the Memo This memo provides information for the Internet community. It does not specify an Internet standard. Distribution of this memo is unlimited. Summary This memo lists a selection of characters and their presence in some coded character sets. To facilitate the coded character set tabulations an unambiguous mnemonic for each character is used, and a format for tabulating the coded character sets is defined. The coded character sets are given names for easy reference. A family of coded character sets called the mnemonic character sets and conversion between these coded character set without information loss is defined. The character set names are registered with the Internet Assigned Numbers Authority (IANA). Additional character sets not described in this memo should be registered with the IANA. This memo may be updated periodically, or additional specifications may be published, to reflect other coded character sets. Please send any comments including comments about the accuracy of the tables to the author, Keld Simonsen . 1. INTRODUCTION With the growing internationalization of the Internet, support for many coded character sets is required. It is the intention of this memo to document precisely the mapping between all characters and their corresponding coded representations in various coded character sets, and give names to these coded character sets, so they can be referenced unambiguously in Internet standards. This memo does not indicate anything about the validity of using these specifications in any Internet standard, so you should consult each individual Internet standard to see which coded character sets and names are allowed there. Unambiguous character mnemonics are specified, which provide a practical way of identifying a character, without reference to a coded character set and its code in this coded character set. The mnemonics are written in a minimal set of characters, namely the invariant 83 graphical characters of ISO 646, which is a kind of greatest common subset to be found between the majority of coded Simonsen [Page 1] RFC 1345 Character Mnemonics & Character Sets June 1992 character sets, including ASCII, national variants of the ISO 646 7- bit character set and various EBCDICs. In addition, the numeric value of the coded representations of all these characters are the same in all coded character sets compatible with ISO standards. All of them except two, EXCLAMATION MARK and QUOTATION MARK, have the same coded representation in all variants of EBCDIC. This minimal set of characters is called the reference character set in this memo. The mnemonics can be used in Internet standards for easy and unambiguous reference, and they can also serve as a fallback representation in various Internet specifications. The coded character sets covered include all parts of ISO 8859, ISO 6937-2 and all ISO 646 conforming coded character sets in the ISO character set registry managed by ECMA according to ISO 2375. Almost all graphic coded character sets in the ECMA registry (1) are covered. The graphic coded character sets not included are registry numbers 31, 38, 39, 53, 59, 68, 71, 72, 129 and 137. In addition many vendor defined character sets are covered, including PC codepages (4), (7), (8), many EBCDIC character sets (4), (5), (6) and HP, DEC and Apple character sets (8), (9), (10), (13), (14). The East-Asian 16-bit character sets from the ECMA registry is also included in this memo. 2. CHARACTER MNEMONICS 2.1 General Syntax The character mnemonics are taken from the ISO committee draft (CD) of the POSIX.2 standard (3). They are classified into two groups: 1. A group with two-character mnemonics - Primarily intended for alphabetic scripts like Latin, Greek, Cyrillic, Hebrew and Arabic, and special characters. 2. A group with variable-length mnemonics - primarily intended for non-alphabetic scripts like Japanese and Chinese, but also used for some accented letters and special characters. In the two-character mnemonics, all invariant graphic character in the ISO 646 character codes except "&" are used, i.e. the following characters: ! " % ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z The character "_" is not used as the first character. In the variable-length mnemonics, the character "_" is not used as the first character. If it is used in a name, its presence is doubled. Simonsen [Page 2] RFC 1345 Character Mnemonics & Character Sets June 1992 The mnemonics can be used in several different ways for different purposes. One of these is description of coded character sets, which is detailed in section 3. Another is for extending a given coded character set to a mnemonic character set. This is described in section 4. The restrictions on the use of the characters "&" and "_" are due to demands of the compositional methods of these techniques. 2.2 ISO Official Long Descriptive Character Name For all mnemonics, the character for which it stands is indicated in the following table by a long descriptive name. This name is identical to the ISO name of the character as given in reference (2). For a few characters that are not included there, descriptive names of the same kind are introduced in this memo. The source of each character is stated in the table after the name and should be consulted for a reliable identification of the character. These long descriptive names consists only of the capital Latin letters of the invariant part of ISO 646, the digits, "-", and SPACE. Digits are only used in names of ideographic and Hangul characters and never as the first character. 2.3 The 2-character Mnemonics The two-character mnemonics include various accented Latin letters, Greek, Cyrillic, Hebrew, Arabic, Hiragana and Katakana. Also a fair number of special characters are included. Almost all ISO or ISO registered 7- and 8-bit graphical coded character sets are covered with these two-character mnemonics. The two characters are chosen so the graphical appearance in the reference set resembles as much as possible (within the possibilities available) the graphical appearance of the character. The basic character set of ISO 646 is used as the reference set, as mentioned above. The characters in the reference character set are chosen to represent themselves. For control characters from ISO 646 the two-character acronyms of ISO 2047 are used as mnemonics. For the other control characters of ISO 6429, two-character mnemonics have been selected based on the variable-length acronyms used in that standard. Letters, including Greek, Cyrillic, Arabic and Hebrew, are represented with the base letter as the first letter, and the second letter represents an accent or relation to a non-Latin script. Non- Latin letters are transliterated to Latin letters, following transliteration standards as closely as possible. This is also done with the Latin letters such as ETH and THORN, and the Danish/Norwegian/Swedish letter A WITH RING ABOVE is transliterated into "aa". Simonsen [Page 3] RFC 1345 Character Mnemonics & Character Sets June 1992 After a letter, the second character signifies the following: Exclamation mark ! Grave Apostrophe ' Acute accent Greater-Than sign > Circumflex accent Question Mark ? tilde Hyphen-Minus - Macron Left parenthesis ( Breve Full Stop . Dot Above Colon : Diaeresis Comma , Cedilla Underline _ Underline Solidus / Stroke Quotation mark " Double acute accent Semicolon ; Ogonek Less-Than sign < Caron Zero 0 Ring above Two 2 Hook Nine 9 Horn Equals = Cyrillic Asterisk * Greek Percent sign % Greek/Cyrillic special Plus + smalls: Arabic, capitals: Hebrew Three 3 some Latin/Greek/Cyrillic letters Four 4 Bopomofo Five 5 Hiragana Six 6 Katakana In designing the mnemonics the following special characters were reserved: The ampersand is reserved as an intro character, indicating that the following string is in the mnemonic character set. The underline character is reserved for the variable-length mnemonics. This use does not eliminate usage as an accent or language identifier. Special characters are encoded with some mnemonic value. These are not systematic thruout, but most mnemonics start with a related special character of the reference set. 2.4 The Variable-length Character Mnemonics The Variable-length Character Mnemonics are primarily meant for the ideographic characters in larger Asian character sets, but are also used for accented characters with several accents and some special characters. To have the mnemonics as short as possible, which both saves storage and is easier to input, a quite short name is preferred. Considering the Chinese standard GB 2312-1980, the Japanese standards JIS X0208 and JIS X0212, and the Korean standard KS C 5601, they are all given by row and column numbers between 1 and 94. So two positions for row and column and a character set identifier of one character would be almost as short as possible. The following character set identifiers are defined: Simonsen [Page 4] RFC 1345 Character Mnemonics & Character Sets June 1992 c GB 2312-1980 j JIS X0208-1990 J JIS X0212-1990 k KS C 5601-1987 This system for the representation of ideographic characters and Hangul characters is not truly mnemonic, but it provides short representations that are easy to connect to the corresponding character by means of the code table of an official character set standard. Alternative methods based on the graphic appearance or the pronunciation of the characters are thought to be unfeasible. One prominent character in the reference character set is reserved for identifying variable-length mnemonics, namely the underline character "_". This character is intended as a delimiter both in the front and in the end of the mnemonic. An example of its use would be: (&=intro): &_j3210_ &_j4436_&_j6530_ 3. CHARACTER MNEMONIC TABLE The following table contains the character mnemonic and the encoding and long descriptive name of ISO 2DIS 10646 (2). Although the ISO 10646 is only at DIS stage at this moment of writing and there is quite some debate about it, the long descriptive naming in the DIS is considered to be stable and the best official ISO reference to character names. The 2-octet encoded value of the ISO 2DIS 10646 is also used, but only as an identification of the character, and it should only be used for identification purposes as the coded representation may be changed in the final 10646 international standard. Some characters not in the ISO 2DIS 10646 are allocated values in the private use zone and given names and references to a character set where it is used. The format of the table is: 1st field is the character mnemonic (mostly 2 characters). 2nd field is the ISO 2DIS 10646 code in hexadecimal. 3rd field is the long descriptive name of ISO 2DIS 10646. SP 0020 SPACE ! 0021 EXCLAMATION MARK " 0022 QUOTATION MARK Nb 0023 NUMBER SIGN DO 0024 DOLLAR SIGN % 0025 PERCENT SIGN & 0026 AMPERSAND ' 0027 APOSTROPHE ( 0028 LEFT PARENTHESIS ) 0029 RIGHT PARENTHESIS * 002a ASTERISK + 002b PLUS SIGN Simonsen [Page 5] (35 pages deleted) Simonsen [Page 40] RFC 1345 Character Mnemonics & Character Sets June 1992 part) "' e007 NON-SPACING ACUTE ACCENT (ISO-IR-103 194) (character part) "> e008 NON-SPACING CIRCUMFLEX ACCENT (ISO-IR-103 195) (character part) "? e009 NON-SPACING TILDE (ISO-IR-103 196) (character part) "- e00a NON-SPACING MACRON (ISO-IR-103 197) (character part) "( e00b NON-SPACING BREVE (ISO-IR-103 198) (character part) ". e00c NON-SPACING DOT ABOVE (ISO-IR-103 199) (character part) ": e00d NON-SPACING DIAERESIS (ISO-IR-103 200) (character part) "0 e00e NON-SPACING RING ABOVE (ISO-IR-103 202) (character part) "" e00f NON-SPACING DOUBLE ACCUTE (ISO-IR-103 204) (character part) "< e010 NON-SPACING CARON (ISO-IR-103 206) (character part) ", e011 NON-SPACING CEDILLA (ISO-IR-103 203) (character part) "; e012 NON-SPACING OGONEK (ISO-IR-103 206) (character part) "_ e013 NON-SPACING LOW LINE (ISO-IR-103 204) (character part) "= e014 NON-SPACING DOUBLE LOW LINE (ISO-IR-38 217) (character part) "/ e015 NON-SPACING LONG SOLIDUS (ISO-IR-128 201) (character part) "i e016 GREEK NON-SPACING IOTA BELOW (ISO-IR-55 39) (character part) "d e017 GREEK NON-SPACING DASIA PNEUMATA (ISO-IR-55 38) (character part) "p e018 GREEK NON-SPACING PSILI PNEUMATA (ISO-IR-55 37) (character part) ;; e019 GREEK DASIA PNEUMATA (ISO-IR-18 92) ,, e01a GREEK PSILI PNEUMATA (ISO-IR-18 124) b3 e01b GREEK SMALL LETTER MIDDLE BETA (ISO-IR-18 99) Ci e01c CIRCLE (ISO-IR-83 0294) f( e01d FUNCTION SIGN (ISO-IR-143 221) ed e01e LATIN SMALL LETTER EZH (ISO-IR-158 142) am e01f ANTE MERIDIAM SIGN (ISO-IR-149 0267) pm e020 POST MERIDIAM SIGN (ISO-IR-149 0268) Tel e021 TEL COMPATIBILITY SIGN (ISO-IR-149 0269) a+: e022 ARABIC LETTER ALEF FINAL FORM COMPATIBILITY (IBM868 144) Fl e023 DUTCH GUILDER SIGN (IBM437 159) GF e024 GAMMA FUNCTION SIGN (ISO-10646-1DIS 032/032/037/122) >V e025 RIGHTWARDS VECTOR ABOVE (ISO-10646-1DIS 032/032/038/046) !* e026 GREEK VARIA (ISO-10646-1DIS 032/032/042/164) ?* e027 GREEK PERISPOMENI (ISO-10646-1DIS 032/032/042/165) J< e028 LATIN CAPITAL LETTER J WITH CARON (lowercase: 000/000/001/240) 4. CHARSETS The character mnemonics hav been used to table a number of coded character sets. The coded character set names are taken if possible from the official ISO registration description in the ISO 2375 (ECMA) register, or with a number like the code page number - or with an indication of the language or country it is being used for - using Simonsen [Page 41] RFC 1345 Character Mnemonics & Character Sets June 1992 the country designators of ISO 3166. For the character sets in the ECMA register, their ISO registration number is also given (as ISO- IR-xxx). Often the ISO registration number does not cover all the codes of a character set in use, but for instance only the graphical characters, where another ISO registration number covers the control characters; in the case of the 8-bit character sets the ISO registration only covers the upper graphical characters (GR). The ISO registration number is here taken to indicate the full coded character set including control characters and lower half of the graphical characters, normally ISO 6429 and ASCII, respectively. The ISO definition of the term "coded character set" is as follows: "A set of unambiguous rules that establishes a character set and the one-to-one relationship between the characters of the set and their coded representation." and this definition may be subject to different interpretations. This memo does not put further restrictions on the term of "coded character set" than the following: "A coded character set is a set of rules that unambiguously and completely determines which sequence of characters, if any, is represented by each possible sequence of n-bit bytes for a certain value of n." This implies that e.g. a coded character set extended with one or more other coded character sets by means of the extension techniques of ISO 2022 constitutes a coded character set in its own right. In this memo the term "charset" is used to refer to the above interpretation of the ISO term "coded character set". A special problem is, if two characters of two different coded character sets with the same descriptive name, or depicted by what looks like the same graphic symbol, or with the same historical origin, really are to be regarded as the same character or not. This problem has been studied in great detail in the development efforts that have resulted in ISO DIS 10646 and Unicode (under the heading "character unification"). As much as possible such results have been used in the construction of the code tables of this section. 4.1 Charset Naming The coded character set names are given in ISO 646 invariant subset (83 characters, where a space in the name is replaced with an underline character; sometimes a hyphen is also used instead of a blank, or the blank is eliminated when practice exist). Case is not significant in the charset names. 4.2 Code Table Format The following code tables are given in a simple format to facilitate use of this text as program input. Programs and routines written in C to handle these tables are freely available from the author of this memo. Keywords are signified with the character "&" as the first character, to distinguish them from ordinary data. Numbers may be given in decimal, hexadecimal or octal notation; hexadecimal numbers are given with an "x" as the first character, and octal numbers has an "o" as the first character. Simonsen [Page 42] RFC 1345 Character Mnemonics & Character Sets June 1992 The following keywords are used: "&charset" has one parameter defining the name of the character set. This is required for every character set. "&alias" has one parameter defining a possible alternate name for the character set. This is optional. "&g0esc", "&g1esc", "&g2esc", "&g3esc", "&c0esc", "&c1esc" has one parameter indicating the string of octets used to define the character set as the G0, G1, G2, G3, C0 or C1 set respectively, according to ISO 2022 (11). The string is to be preceded by an ESC character. It is only the relevant parts of the table, which can be used with the definition; the charset is often coded with both graphical and control character sets. If the coded character set is a 96-character set, it is tabled with the relevant GL set (normally ISO-IR-6) and with ISO 6429 as C0 and C1 (12). If it is a 94- character set, it is tabled with the C0 set of ISO 6429. If it is a double-octet coded character set, it is tabled without control character sets and accompanying one-octet coded character sets, and the two-octet code is tabled as a G0 set. "&bits" has one parameter indicating the number of bits to represent the charset. This is optional and 8 bits is the default. "&code" has one parameter indicating the byte number allocated to the following character mnemonic. After the "&code" specification the characters are listed with their mnemonic in ascending order. A character mnemonic of "??" indicates that the position is unused. A character mnemonic of "__" indicates that the character set is not completely defined with the specifications in this memo. "&code2" has 2 parameters specifying the row and column in certain 16-bit character sets. The value 32 must be added to obtain the first and second byte respectively. Mnemonics can be specified after the "&code2" specification as mentioned for the "&code" specification. "&codex" has 5 parameters, specifying the character set prefix string, the start row number, the end row number, the start column number and the end column number respectively. This is equivalent to specifying a series of mnemonics of the form "nrrcc" where "n" is the character set name prefix string, "rr" is the row number running from the specified start row number to the end row number, and "cc" is the column number running from the specified start column number to the end column number. The thereby created series mnemonics are allocated to code positions which are added 32 to the row and column numbers to get the row and column octet. "&duplicate" has a special meaning indicating that a position is being used for more than one character. This is an ugly convention but it is a sad fact of life that same code in one coded character set can mean different characters. "&duplicate" takes two parameters Simonsen [Page 43] RFC 1345 Character Mnemonics & Character Sets June 1992 - the first is the code to be duplicated, the other is the new mnemonic. "&rem" is followed by text to explain something in the table to a human reader. All lines in such a remark has to start with this keyword. "&comb2" specifies a combination of two characters which signifies a third character. All characters in the specification are given by their mnemonic. The two combining characters must be specified previously in the code table. The first combining character is specified as the first character after the keyword, and then the following pairs of characters are the second combining character and the result, respectively. The specification can be repeated, terminated by an occurrence of a keyword. 4.3 Mnemonic charsets The following is compatible with current practice on the internet within EUnet - the European not-for-profit networking organisation in Europe and North Africa currently operating in 24 countries. The mnemonic charsets are a family of charsets which have the facility that within the relevant parts of the message, encoded in an ordinary coded character set, text may have occurrences of the following sequence: an intro character sequence, followed by a string of characters that represent a character mnemonic, as described below. Similarly, the intro character sequence may be doubled, indicating a single occurrence of the respective symbols in decoded format. Note that many characters within a mnemonic character set may be represented in two different ways. Normally the character itself is used, but it is also possible to use the mnemonic allocated to the character in a mnemonic sequence. In this way all characters with assigned mnemonics can be represented without information loss in any character set, which contains the invariant ISO 646 characters as a subset. As a consequence, using a mnemonic character set all these characters can be generated uniformly on all keyboards and presented uniformly on all terminal equipment, whenever the real character is not available. Data encoded in a mnemonic charset is intended to be read by the end user possibly without further treatment. If the transport encoding and the presentation encoding for the user differ, it is recommended that the data be translated into a mnemonic representation in the presentation encoding. A mnemonic charset is specified with the name "mnemonic+charset+intro" where "mnemonic" is written as given and "charset" and "intro" is specified as described below. The mnemonic charset "mnemonic" is a shorthand for "mnemonic+ascii+38". The Simonsen [Page 44] RFC 1345 Character Mnemonics & Character Sets June 1992 mnemonic charset "mnem" is a shorthand for "mnemonic+ascii+8200". It is discouraged to use mnemonics for Chinese characters of either Chinese, Japanese or Korean origin, as the probability that the end user equipment can deal with the original encoding is very high for the intended receiver, and the mnemonics for such Chinese characters described in this memo convey very little meaning to humans. 4.3.1 charset The charset is given as one of the charset names in this memo and is the encoding used for the transport. It cannot be a mnemonic charset. 4.3.2 Intro The intro character sequence is given as the decimal value of the intro characters in the transport character set. There may be up to two characters used in the intro character sequence, and the decimal value for two-character intro sequences are then the first character value multiplied with 256 to the power of the number of octets used in the character set, plus the second character value. The recommended value is 38 for the ampersand (&) character in ASCII. Another common value is 29 for the control character "Group Separator", or 8200 for "space" followed by "backspace", which may be convenient when operating in some environments, and ordinary text is not changed. Only the ampersand character may be chosen as intro from the invariant ISO 646 charset, but any character not in the invariant ISO 646 character can be used as intro. The intro character sequence is used for introducing character mnemonics when a character is not present in the mail transport character set (as defined by "charset"). Character mnemonics longer than two characters are surrounded by the underline character. The intro character sequence is doubled to represent one occurrence of itself. Characters in the mail transport character set are normally just represented with their encoding, but may also be represented by the intro character sequence and the mnemonic encoding. If the intro character sequence is specified as 0 (zero), it is omitted in the transport, giving a better readably content, but eliminating the possibility of reversibility and introducing an information loss. With intro specified as 0, also underline characters surrounding mnemonics longer than 2 characters are removed. Mnemonic charsets with the intro specified as zero is equivalent to the ordinary charset, e.g. "mnemonic+ascii+0" is equivalent to "ascii". The intro character can be given in a header "Mnemonic-Intro:" with the value given in decimal as noted above in the first parameter. This has only meaning if the charset can be deducted by other information as specified by the relevant Internet specification. This information has precedence over other information on the intro. Simonsen [Page 45] RFC 1345 Character Mnemonics & Character Sets June 1992 4.3.3 Compatibility If applications conforming to this memo interoperate with other versions of this memo, and encounter mnemonics that are undefined with this memo, they shall leave the mnemonic as it is coded. This provides for upward compatibility. 4.3.4 Conversion Between Mnemonic Charsets To determine which mnemonic charsets are permitted with the use of an Internet specification, please refer to that specification. It may be that only "ASCII" or "INVARIANT" is allowed as the base charset. ASCII is the most used character set, while INVARIANT will be very robust for traversing gateways, but it will cause trouble for (amongst other things) source code for several programming languages. The use of other character sets may be limited to agreement between the communicating parties. When such an agreement has been achieved, a conversion between different mnemonic charsets can be done according to the charset tables below, as characters occurring in both encodings are just transformed, and characters not existing in the receiving coded character set are represented by the intro character sequence of the receiving coded character set plus the character mnemonic, as described for the intro character sequence. The characters forming the mnemonic are translated into the receiving code, which must have these characters present. An undefined character in the originating coded character set is transformed into the following sequence: the intro character sequence, an underline, a question mark character, a "u" (for undefined) and then the hexadecimal value of the character with letters in lowercase (possibly more than one byte for multibyte character sets) and then a terminating underline character. Headers may need to be changed accordingly to reflect such conversion. The character mnemonic "/c" has a special meaning in specifying that a line is to be continued even if the next characters are specifying a new line. 5. CHARSET TABLES &charset ISO_646.basic:1983 &rem source: ECMA registry &alias ref &code 32 SP ! " ?? ?? % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? ?? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ?? ?? ?? ?? _ ?? a b c d e f g h i j k l m n o p q r s t u v w x y z &charset INVARIANT &code 0 NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US SP ! " ?? ?? % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? ?? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ?? ?? ?? ?? _ ?? a b c d e f g h i j k l m n o p q r s t u v w x y z ?? ?? ?? ?? DT Simonsen [Page 46] RFC 1345 Character Mnemonics & Character Sets June 1992 &charset ISO_646.irv:1983 &rem source: ECMA registry &alias iso-ir-2 &alias ir &g0esc x2840 &g1esc x2940 &g2esc x2a40 &g3esc x2b40 &code 0 NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US SP ! " Nb Cu % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? At A B C D E F G H I J K L M N O P Q R S T U V W X Y Z <( // )> '> _ '! a b c d e f g h i j k l m n o p q r s t u v w x y z (! !! !) '- DT &charset BS_4730 &rem source: ECMA registry &alias iso-ir-4 &alias ISO646-GB &g0esc x2841 &g1esc x2941 &g2esc x2a41 &g3esc x2b41 &alias gb &alias uk &code 0 NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US SP ! " Pd DO % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? At A B C D E F G H I J K L M N O P Q R S T U V W X Y Z <( // )> '> _ '! a b c d e f g h i j k l m n o p q r s t u v w x y z (! !! !) '- DT &charset ANSI_X3.4-1968 &rem source: ECMA registry &alias iso-ir-6 &alias ANSI_X3.4-1986 &alias ISO_646.irv:1991 &g0esc x2842 &g1esc x2942 &g2esc x2a42 &g3esc x2b42 &alias ASCII &alias ISO646-US &alias US-ASCII &alias us &alias IBM367 &alias cp367 &code 0 NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US SP ! " Nb DO % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? At A B C D E F G H I J K L M N O P Q R S T U V W X Y Z <( // )> '> _ '! a b c d e f g h i j k l m n o p q r s t u v w x y z (! !! !) '? DT &charset NATS-SEFI &rem source: ECMA registry &alias iso-ir-8-1 &g0esc x2843 &g1esc x2943 &g2esc x2a43 &g3esc x2b43 &code 0 NU SH SX EX ET EQ AK BL BS HT LF VT FF CR SO SI DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US SP ! " Nb DO % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? Simonsen [Page 47] (53 pages deleted) RFC 1345 Character Mnemonics & Character Sets June 1992 DL D1 D2 D3 D4 NK SY EB CN EM SB EC FS GS RS US ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? SP ?? ?? ?? ?? ?? ?? ?? ?? ?? Ct . < ( + !! & ?? ?? ?? ?? ?? ?? ?? ?? ?? ! DO * ) ; NO - / ?? ?? ?? ?? ?? ?? ?? ?? BB , % _ > ? ?? ?? ?? ?? ?? ?? ?? ?? ?? '! : Nb At ' = " ?? a b c d e f g h i ?? ?? ?? ?? ?? ?? ?? j k l m n o p q r ?? ?? ?? ?? ?? ?? ?? '? s t u v w x y z ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? (! A B C D E F G H I ?? ?? ?? ?? ?? ?? !) J K L M N O P Q R ?? ?? ?? ?? ?? ?? // ?? S T U V W X Y Z ?? ?? ?? ?? ?? ?? 0 1 2 3 4 5 6 7 8 9 ?? ?? ?? ?? ?? DT ACKNOWLEDGEMENTS This memo has been produced with a grant from Nordisk Industrifond project number 91030. I thank all of the people in the IETF 822ext WG for their constructive discussion and remarks on this memo. People from many other circles have also commented on the text and the tables. The following is a list of persons that I remember bringing forward suggestions that made me change the specifications - my aging memory may have forgot even significant contributions, and I apologize for that. Alain LaBonte' Alina Da Cruz Anders Samuelsson Bob Smart Cuong Bui Dan Oscarsson David Crocker David Joslin Dick Weaver Dmitry V. Volodin Erik van der Poel Geir Petersen Greg Vaudreuil Harald Tveit Alvestrand Hugh Tucker Isai Scheinberg James Do Jan-Michael Rynning Johan van Wingen John C. Klensin John F. Chandler Johnny Erikson Justin Bur Keith Moore Kevin Donnelly Kim F. Storm Marius Olofson Masahiro Sekiguchi Maurizio Sichera Michael Patton Nandor Horvath Nathaniel Borenstein Ned Freed Neil Katin Olle Jaernefors Patrick Faeltstroem Paul Pomes Peter Svanberg Philippe-Andre' Prindeville Randall Atkinson Steve Hardcastle-Kille Simonsen [Page 101] RFC 1345 Character Mnemonics & Character Sets June 1992 REFERENCES (1) ISO 2375 registration: "International Register of Coded Character Sets to be Used With Escape Sequences", European Computer Manufacturers Association (ECMA), Rue du Rhone 114, CH-1204 Geneve, Switzerland, December 1990. (2) ISO 2DIS 10646, Information Technology - Universal Multiple-Octet Coded Character Set (UCS), ISO/IEC JTC1/SC2/WG2 N783 (26 December 1991). (3) ISO/IEC 9945-2.2 CD POSIX Shell and Utilities, informative annex F, ISO/IEC JTC1/SC22 N1063 (October 1991). (4) IBM National Language Support Reference Manual Volume 2, SE09- 8002-01 (March 1990). (5) IBM 3174 Establishment Controller, Character Set Reference, GA27-3831-02 (March 1990). (6) IBM 3270 Information Display System Character Set Reference, Chapter 10, GA27-2837-9 (April 1987) (7) IBM DOS 3.30 Reference (Abridged) 94X9575 (February 1987) (8) IBM Keyboard layouts and code pages, Part Number 07G4586 (June 1991) (9) HP LaserJet IIP Printer User's Manual, HP Part No. 33471-90901 (June 1989) (10) Danish Standard DS 2089, Application of ISO 7-bit coded character set, UDC 681.3:003.62, February 1974. (withdrawn). (11) ISO 2022:1986 Information processing - ISO 7-bit and 8-bit coded character sets - Code extension techniques. (12) ISO 6429:1988 Information processing - ISO 7-bit and 8-bit coded character sets - Control functions for 7-bit and 8-bit coded character sets. (13) VAX/VMS User's Manual, Order Number: AI-Y517A-TE, April 1986. (14) The Unicode Standard Version 1.0 Volume 1, ISBN 0-201-56788-1 (October 1991). Simonsen [Page 102] RFC 1345 Character Mnemonics & Character Sets June 1992 Author's Address Keld Simonsen Rationel Almen Planlaegning Sankt Joergens Alle 8 DK-1615 Koebenhavn V Danmark Tel: +45 31 22 65 43 Fax: +45 33 15 85 16 Email: Keld.Simonsen@dkuug.dk Simonsen [Page 103]