A warning to all who have code that does a ISO 8859-7 -> UCS
conversion:
The old preliminary ISO 8859 tables on ftp://ftp.unicode.org/Public/
MAPPINGS/ have been superseded and updated by the new ISO 8859:1999
editions, which now contain official UCS mappings. There seems to be
only one major change: With the publication of FCD 8859-7 (Latin/
Greek), two bytes were mapped differently than the Unicode tables had
forseen it:
Remap 0xA1 to U+2018 (instead of U+02BD)
Remap 0xA2 to U+2019 (instead of U+02BC)
You should probably update your conversion tables accordingly.
If you do a UCS -> ISO 8859-7 conversion, then your software should
map both the old and the new UCS characters appropriately
U+2018, U+02BD -> 0xA1
U+2019, U+02BC -> 0xA2
UCS -> something-else mapping software should always be prepared to
perform non-injective (for US speakers: many-to-one) mappings.
This change seems to be related to the apostrophe semantics errata in
Unicode 2.1 and UTR7:
http://www.unicode.org/unicode/reports/tr8.html
which says in section 3.6 "Apostrophe Semantics Errata":
U+02BC MODIFIER LETTER APOSTROPHE is preferred where the character
is to represent a modifier letter (for example, in transliterations
to indicate a glottal stop.) In the latter case, it is also referred
to as a letter apostrophe.
U+2019 RIGHT SINGLE QUOTATION MARK is preferred where the character
is to represent a punctuation mark, as in "We've been here before."
In the latter case, U+2019 is also referred to as a punctuation
apostrophe.
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT