Minor error in TR #16

From: Doug Ewell (dewell@compuserve.com)
Date: Wed Jun 14 2000 - 10:18:40 EDT


I found a small error in Technical Report #16, "UTF-EBCDIC."

In Section 3.5, "Signature," there is the following passage:

   The signature character U+FEFF (zero width no-break space) of Unicode
   transforms into the I8-byte sequence X'F1 BF B7 BF' which maps to
   X'DD 73 66 73' in UTF-EBCDIC. When this sequence is displayed
   (erroneously) using different a [sic] single-byte EBCDIC code pages,
   it can be visualized as different character strings. In Latin-1
   EBCDIC code page 1047 (and coincidentally also in Latin-1 code pages
   500 and 37), this byte sequence appears as "ùËÃÊ" (small letter u
   with grave, capital letter E with diaeresis, capital letter A with
   tilde, capital letter E with circumflex).

If the 4-character I8-sequence contains two 0xBF bytes, and they both
map to 0x73 (as of course they must), then they will not be displayed
as the two different characters 'Ë' and 'Ê'. The text should read:

   ... this byte sequence appears as "ùËÃË" (small letter u with grave,
   capital letter E with diaeresis, capital letter A with tilde, capital
   letter E with diaeresis).

The stray "a" in the passage which I marked with "[sic]" was left in for
accuracy, but it is not the error I was referring to. The TR contains
several such typos, so it would be unfair to single this one out.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT