Michael Everson <everson@indigo.ie> wrote:
[snip]
>So anyway roundtrip conversion with 5426 is no longer a reason for encoding
>two characters for umlaut and diaeresis. Anyway they would be impossible to
>manage. What do you do when cataloguing a book by a German and a Frenchman
>both? Use two different codes? How about books in French by Germans or
>books in German by Frenchmen? &c. &c.
ISO 5426 (Extended Latin for bibliographic use) encodes discrete
characters for umlaut and diaeresis, however the Extended Latin
character set of USMARC ["MARC" = MAchine Readable Cataloging] has
just a single character, "Umlaut (Diaeresis)." Despite the "US" in
its name, USMARC, published by the Library of Congress, is used
worldwide for library data.
So the problem of roundtrip conversion exists even for 8-bit library
character sets, when converting between UNIMARC (which specifies ISO
5426) and USMARC.
The umlaut/diaeresis issue was debated by the Unicode Working Group
(predecessor to the UTC) during the development of Version 1.0,
because of the different approaches in library character sets. The
decision was to have a single unified character.
Michael commented:
>Anyway they would be impossible to manage.
Agreed.
If umlaut and diaeresis were to be separately encoded (as in ISO
5426), these are the consequences:
* When both characters are available, users must (a) know which to
choose for a particular context and (b) make the contextually
correct choice consistently.
* However, it cannot be assumed that both characters would be
supported in every application. If only one of the "double
dot above" characters was available, it would be used as both
umlaut and diaeresis by most users.
-- Joan Aliprand
Senior Analyst
The Research Libraries Group
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT