ISO 8859-7 table changed

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Fri Oct 29 1999 - 06:22:51 EDT


A warning to all who have code that does a ISO 8859-7 -> UCS
conversion:

The old preliminary ISO 8859 tables on ftp://ftp.unicode.org/Public/
MAPPINGS/ have been superseded and updated by the new ISO 8859:1999
editions, which now contain official UCS mappings. There seems to be
only one major change: With the publication of FCD 8859-7 (Latin/
Greek), two bytes were mapped differently than the Unicode tables had
forseen it:

   Remap 0xA1 to U+2018 (instead of U+02BD)
   Remap 0xA2 to U+2019 (instead of U+02BC)

You should probably update your conversion tables accordingly.

If you do a UCS -> ISO 8859-7 conversion, then your software should
map both the old and the new UCS characters appropriately

  U+2018, U+02BD -> 0xA1
  U+2019, U+02BC -> 0xA2

UCS -> something-else mapping software should always be prepared to
perform non-injective (for US speakers: many-to-one) mappings.

This change seems to be related to the apostrophe semantics errata in
Unicode 2.1 and UTR7:

  http://www.unicode.org/unicode/reports/tr8.html

which says in section 3.6 "Apostrophe Semantics Errata":

  U+02BC MODIFIER LETTER APOSTROPHE is preferred where the character
  is to represent a modifier letter (for example, in transliterations
  to indicate a glottal stop.) In the latter case, it is also referred
  to as a letter apostrophe.

  U+2019 RIGHT SINGLE QUOTATION MARK is preferred where the character
  is to represent a punctuation mark, as in "We've been here before."
  In the latter case, U+2019 is also referred to as a punctuation
  apostrophe.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT