On Mon, 4 Nov 2013 19:00:17 +0000
Jennifer Wong <jennifer.wong_at_workday.com> wrote:
> Thank you everyone for your input.
>
> The use case is that customers want to integrate data from our
> enterprise solution to their ASCII-based downstream systems. Thus all
> accents need to be removed.
Have you confirmed that they are using ASCII rather than say, Latin-1?
Some people call Latin-1 ASCII!
> Ilay's "Transliteration on Passport" doc is very useful. We can use
> that as a basis to map special transliteration cases before
> normalizing and removing accents.
Have you checked how they are currently handling accents? Do you need
to be even more brutal in places and strip out apostrophes? An
O'Sullivan at my place of work had to accept the mangling of his
surname to Osullivan!
How are you constraining the input repertoire? Stripping diacritics
won't deal with U+0131 LATIN SMALL LETTER DOTLESS I, and would make a
mess of the usually incorrect <U+0131, U+0307 COMBINING DOT ABOVE>.
Richard.
Received on Mon Nov 04 2013 - 14:21:09 CST
This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 14:21:10 CST