Umlaut and diaeresis

From: Joan Aliprand (BR.JMA@rlg.org)
Date: Tue Jun 15 1999 - 14:11:05 EDT

Next message: Sarasvati: "FTP access restored"
Previous message: John Cowan: "Unicode FTP site unofficial mirror"
Next in thread: Figge, Donald: "Re: Umlaut and diaeresis"
Maybe reply: Figge, Donald: "Re: Umlaut and diaeresis"
Maybe reply: Kenneth Whistler: "Re: Umlaut and diaeresis"
Maybe reply: Jeroen Hellingman: "Re: Umlaut and diaeresis"
Maybe reply: Jeroen Hellingman: "Re: Umlaut and diaeresis"
Maybe reply: Karlsson Kent - keka: "RE: Umlaut and diaeresis"
Maybe reply: Jeroen Hellingman: "Re: Umlaut and diaeresis"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Michael Everson <everson@indigo.ie> wrote:

[snip]

>So anyway roundtrip conversion with 5426 is no longer a reason for encoding
>two characters for umlaut and diaeresis. Anyway they would be impossible to
>manage. What do you do when cataloguing a book by a German and a Frenchman
>both? Use two different codes? How about books in French by Germans or
>books in German by Frenchmen? &c. &c.

ISO 5426 (Extended Latin for bibliographic use) encodes discrete
characters for umlaut and diaeresis, however the Extended Latin
character set of USMARC ["MARC" = MAchine Readable Cataloging] has
just a single character, "Umlaut (Diaeresis)." Despite the "US" in
its name, USMARC, published by the Library of Congress, is used
worldwide for library data.

So the problem of roundtrip conversion exists even for 8-bit library
character sets, when converting between UNIMARC (which specifies ISO
5426) and USMARC.

The umlaut/diaeresis issue was debated by the Unicode Working Group
(predecessor to the UTC) during the development of Version 1.0,
because of the different approaches in library character sets. The
decision was to have a single unified character.

Michael commented:
>Anyway they would be impossible to manage.

Agreed.

If umlaut and diaeresis were to be separately encoded (as in ISO
5426), these are the consequences:
* When both characters are available, users must (a) know which to
  choose for a particular context and (b) make the contextually
  correct choice consistently.
* However, it cannot be assumed that both characters would be
  supported in every application. If only one of the "double
  dot above" characters was available, it would be used as both
  umlaut and diaeresis by most users.

-- Joan Aliprand
Senior Analyst
The Research Libraries Group

To: UNICODE@UNICODE.ORG

Next message: Sarasvati: "FTP access restored"
Previous message: John Cowan: "Unicode FTP site unofficial mirror"
Next in thread: Figge, Donald: "Re: Umlaut and diaeresis"
Maybe reply: Figge, Donald: "Re: Umlaut and diaeresis"
Maybe reply: Kenneth Whistler: "Re: Umlaut and diaeresis"
Maybe reply: Jeroen Hellingman: "Re: Umlaut and diaeresis"
Maybe reply: Jeroen Hellingman: "Re: Umlaut and diaeresis"
Maybe reply: Karlsson Kent - keka: "RE: Umlaut and diaeresis"
Maybe reply: Jeroen Hellingman: "Re: Umlaut and diaeresis"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT