Hi Jennifer,
On Fri, Nov 1, 2013 at 8:37 AM, Jennifer Wong <jennifer.wong_at_workday.com>wrote:
> I would like to ask for advice on removing accents from characters.
> While the normalization process is straight forward (NFD, remove accents),
> it does not take into account of special cases. For example, Danish, "å"
> should be mapped to "aa", not "a". Likewise, in German, "ä" "ö" "ü" should
> be mapped to "ae", "oe" and "ue" respectively, not "a", "e", "u". Are
> there common practices on how to handle these special cases? Thank you.
>
Can you describe what your use case is?
One possible area that appears not to have been discussed yet is sorting of
strings and full-text search (as in ctrl-F in a browser or word processor).
If you are after those, then please look for "unicode collation" and "cldr
collation". The ICU libraries
<http://userguide.icu-project.org/collation>might also help.
Best regards,
markus
Received on Mon Nov 04 2013 - 12:58:10 CST
This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 12:58:13 CST