Re: How to remove accents while conforming to language standards?

From: Ilya Zakharevich <nospam-abuse_at_ilyaz.org>
Date: Fri, 1 Nov 2013 16:33:37 -0700

On Fri, Nov 01, 2013 at 07:32:44PM +0200, Jukka K. Korpela wrote:
> 2013-11-01 17:37, Jennifer Wong wrote:
>
> >I would like to ask for advice on removing accents from characters.
>
> To address first the question you ask in the Subject line, “How to
> remove accents while conforming to language standards?”, but do not
> ask in the message body, the answer is: You can’t.

Of course, he can. He even provided an algorithm to do it.

  (And to address “it is as acceptable as stripping the vowels from
   English”, stripping vowels from English CAN be done, and it MUST be
   done if the context requires it.)

This mailing list bursts with reasonable insightful people. This
question comes again and again; how comes that it is ALWAYS that the
same answer pops out, the answer which is meaningless, not helpful,
and, MOREOVER, wrong?

I suspect that what the participants wanted to write was that such
processes are usually LOSSY, not that they CANNOT be done. Given that
the initial question was more or less explicitly formulated as “how to
minimize the losses?”, I think that what is happening in this thread
is even less forgivable than the other times this was happening here…

When one MUST convert into an accent-less form [for human consumption]
(the situation which, being in US, I find myself frequently in), SOME
losses are usually tolerable. One approach (which is very often
applicable) is “lossy; so what?”; just strip away, and be happy.

If minimization of losses is important, this question was also
answered on this list. Checking “my database of useful answers”
  http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Useful_tidbits_from_Unicode_mailing_list_%28unsorted%29
I see:

  Transliteration on passports (see p.IV-48)
    http://www.icao.int/publications/Documents/9303_p1_v1_cons_en.pdf

[BTW, the URL for the database contains a misprint; nowadays, most of
the entries are sorted into categories. “This one”, though, is not sorted.]

Hope this helps,
Ilya
Received on Fri Nov 01 2013 - 18:35:28 CDT

This archive was generated by hypermail 2.2.0 : Fri Nov 01 2013 - 18:35:28 CDT