Re: Umlaut and diaeresis

From: Alain LaBont\i\ (alb@sct.gouv.qc.ca)
Date: Mon Jun 21 1999 - 14:41:44 EDT


A 10:55 99-06-21 -0700, Figge, Donald a écrit :
>Because these two characters are unified, the composition software needs to
>be smart enough to know that a word can be divided between two vowels when
>one of them has a diaeresis mark, but not necessarily if the same mark is
>intended to serve as an umlaut.
>
>The argument that alphabetic characters are pronounced differently in
>various languages but still have the same code point misses the point of my
>original question which is why unification when the umlaut and diaeresis
>have different basic functionalities.

[Alain] I do not think that you can break words with diaeresises
indifferently of the language anyway. It would not be allowed to break
« aiguë » (other recently [1975] accepted spelling: « aigüe ») or « Noël »,
or « ambiguïté » in French, for example, before the diaeresised vowel.

Of course I agree that you have no right to replace a « ü » in French by
fallbacks like « ue » either as occasionally done for German proper names
too, emphasizing that « diaresis » (tréma) and « unmlaut » do not have the
same properties indeed in general.

However the properties depend more atomicly on language and even a
diaeresis might not have the same properties in all languages (my example
for the more recent spelling of « aigüe » is an excellent case in point as
it is far to be orthodox in French [it breaks all the previous rules which
said that a diaeresised letter had to be preceded by a vowel -- the
« Académie française » blessed the new practice in 1975 [not yet adopted by
dictionaries nor grammarians who are supposed to be more innovative than a
standarizing committee]!] because so many prestigious authors were making
the spelling mistake!!! -- in doing this they changed diaeresis properties
in French].

My conclusion: character properties *may* incidentally be generally
applicable in a given script, but one has to recognize that they, in some
measure, depend on the language in use. Language tagging is therefore of
utmost importance, beyond character coding.

One could argue that the language maybe deduced from the context. Sometimes
it is not, as in the following example:

« Jean put dire comment on tape » (en). We all know what this means in
English. The exact same sentence, using the same characters exactly, in
French, also makes sense but rather means « Jean [male in French, meaning
"John"!] was able to say how one types ». (;

Alain LaBonté
Québec



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT