Re: How to remove accents while conforming to language standards?

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Fri, 01 Nov 2013 19:32:44 +0200

2013-11-01 17:37, Jennifer Wong wrote:

> I would like to ask for advice on removing accents from characters.

To address first the question you ask in the Subject line, “How to
remove accents while conforming to language standards?”, but do not ask
in the message body, the answer is: You can’t. Well, except in cases
where language standards permit the omission. For example, according to
modern French orthography standards, the circumflex in “fraîche” could
and should be dropped (though it is still very common to keep it).

> While the normalization process is straight forward (NFD, remove
> accents),

NFD does *not* remove accents. It is decomposition, not destruction. It
decomposes, say, “å” to “a” followed by a combining ring above. If you
then have your own code removes the combining marks, that’s a different
issue, and generally a wrong thing to do.

> For example,
> Danish, "å" should be mapped to "aa", not "a".

“Should” as per which standard or policy? It is gene rally accepted for
Danish to replace “å” by “aa” if you cannot use “å”. But what might be
the situation, in the year 2013, where you really cannot use “å”?

> Likewise, in German, "ä"
> "ö" "ü" should be mapped to "ae", "oe" and "ue" respectively, not "a",
> "e", "u". Are there common practices on how to handle these special
> cases?

There are various language-specific practices. They are not universal.
For example, in Spanish texts, I don’t think many people would find it
acceptable to replace “ü” by “ue”, rather than just “u”, if some evil
powers force you to stick to Ascii characters.

Yucca
Received on Fri Nov 01 2013 - 12:34:43 CDT

This archive was generated by hypermail 2.2.0 : Fri Nov 01 2013 - 12:34:43 CDT