Re: How to remove accents while conforming to language standards?

From: Petite Abeille <petite.abeille_at_gmail.com>
Date: Mon, 4 Nov 2013 22:20:04 +0100

On Nov 1, 2013, at 4:37 PM, Jennifer Wong <jennifer.wong_at_workday.com> wrote:

> I would like to ask for advice on removing accents from characters. While the normalization process is straight forward (NFD, remove accents), it does not take into account of special cases. For example, Danish, "å" should be mapped to "aa", not "a". Likewise, in German, "ä" "ö" "ü" should be mapped to "ae", "oe" and "ue" respectively, not "a", "e", "u". Are there common practices on how to handle these special cases? Thank you.

Perhaps Sean M. Burke's Unidecode! may be of interest:

http://interglacial.com/tpj/22/
http://search.cpan.org/perldoc/Text::Unidecode
Received on Mon Nov 04 2013 - 15:22:17 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 15:22:17 CST