From: Hallvard B Furuseth (h.b.furuseth@usit.uio.no)
Date: Fri Dec 19 2003 - 10:29:23 EST
D. Starner writes:
>> The result is much better if you allow the ASCII conversion to be a string.
>> This allows you to, e.g., "©" = "(c)", "½" = "1/2", and so on. This is also
>> good for letters: "ß" = "ss", "å" = "aa", etc.
>
> etcetra? I think he needs more direction then that, especially most naïve
> algorithms are going to produce "a" from "å". Diagraphs can be treated
> as titlecase or capital or intelligently.
Hm. Actually I'll want a mode which generates "a" rather than "aa" for
that one, to mimic local practice for how to generate e-mail adresses.
Though that can be tacked on with an extra hack afterwards.
One question, unless it has been answered already - I need to read up on
Unicode before I'll understand all the answers:
I'd like to translate 'ø' to 'o' or maybe 'oe'. 'o' at least when used
for matching, since it should match Swedish 'ö'. However,
UnicodeData.txt has no decomposition property for that character:
00F8;LATIN SMALL LETTER O WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER O SLASH;;00D8;;00D8
Is there some other property I can use? Or is this a rare special case
to handle by hand?
-- Hallvard
This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 12:13:11 EST