Umlaut and diaeresis

From: Joan Aliprand (BR.JMA@rlg.org)
Date: Tue Jun 15 1999 - 14:11:05 EDT


Michael Everson <everson@indigo.ie> wrote:

[snip]

>So anyway roundtrip conversion with 5426 is no longer a reason for encoding
>two characters for umlaut and diaeresis. Anyway they would be impossible to
>manage. What do you do when cataloguing a book by a German and a Frenchman
>both? Use two different codes? How about books in French by Germans or
>books in German by Frenchmen? &c. &c.

ISO 5426 (Extended Latin for bibliographic use) encodes discrete
characters for umlaut and diaeresis, however the Extended Latin
character set of USMARC ["MARC" = MAchine Readable Cataloging] has
just a single character, "Umlaut (Diaeresis)." Despite the "US" in
its name, USMARC, published by the Library of Congress, is used
worldwide for library data.

So the problem of roundtrip conversion exists even for 8-bit library
character sets, when converting between UNIMARC (which specifies ISO
5426) and USMARC.

The umlaut/diaeresis issue was debated by the Unicode Working Group
(predecessor to the UTC) during the development of Version 1.0,
because of the different approaches in library character sets. The
decision was to have a single unified character.

Michael commented:
>Anyway they would be impossible to manage.

Agreed.

If umlaut and diaeresis were to be separately encoded (as in ISO
5426), these are the consequences:
* When both characters are available, users must (a) know which to
  choose for a particular context and (b) make the contextually
  correct choice consistently.
* However, it cannot be assumed that both characters would be
  supported in every application. If only one of the "double
  dot above" characters was available, it would be used as both
  umlaut and diaeresis by most users.

-- Joan Aliprand
   Senior Analyst
   The Research Libraries Group

To: UNICODE@UNICODE.ORG



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT