From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 05 2007 - 18:10:43 CST
> What a strange idea... which would have the bad effect of creating lots of
> ambiguities, or unpronounceable and unrecognizable words (it won't even help
> English users).
etc., etc.
I don't see the point in respondents here on this list getting
so snippy and snide about the original posting. Maybe some people have
reasons for comparing Latin strings without their diacritics.
Yes, maybe they could do something more sophisticated with
an ICU collator, but then again, maybe they don't want something
more sophisticated.
And nobody is telling people to spell French by removing the
accents, by the way.
There is a much better (and less chip-on-the-shoulder) discussion
of the .NET topic on Michael Kaplan's blog:
http://blogs.msdn.com/michkap/archive/2005/02/19/376617.aspx
Oh, and if anybody had bothered to track back to that discussion,
you can see:
A) People were aware of Latin letters with diacritics that
don't have decompositions can't be treated this way simply
by doing an NFD (or NFKD) decomposition.
B) Nobody was proposing that Indic scripts be "vandalized" by
stripping out combining marks. This was an exercise in
folding Latin letters by removing accents.
Incidentally, in case anybody wasn't paying attention, the draft
UTR #30 on Character Foldings:
http://www.unicode.org/reports/tr30/
has talked for some time now both about "accent removal" folding
and "diacritic removal" folding -- and even has a provisional
data file: DiacriticFolding.txt, to assist in the latter.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Mar 05 2007 - 18:13:02 CST