From: D. Starner (shalesller@writeme.com)
Date: Fri Dec 19 2003 - 08:21:48 EST
> The result is much better if you allow the ASCII conversion to be a string.
> This allows you to, e.g., "©" = "(c)", "½" = "1/2", and so on. This is also
> good for letters: "ß" = "ss", "å" = "aa", etc.
etcetra? I think he needs more direction then that, especially most naïve
algorithms are going to produce "a" from "å". Diagraphs can be treated
as titlecase or capital or intelligently.
00FE - "th"
00DE - "TH"
00F0 - "dh" ("th"?)
OOD0 - "DH" ("TH"?)
0108 - "CH" (Esperanto)
0109 - "ch"
011C, 011D - "GH", "gh" (E-o)
0124, 0125 - "HH", "hh" (")
0134, 0135 - "JH", "jh" (")
015C, 015D - "SH", "sh" (")
017F - "s"
Depending on your goals, 015F & 0161 could be "sh", 0163 "ts",
017D "zh", etc.
0195 - "hw"
01A3 - "gh"(?)
01BF - "w"
01C0 - "|" ("c"?)
01C1 - "||"? ("x"?)
01C3 - "!" ("q"?)
0223 - "w" ("ou"? "8"?)
I omitted most capitals and those that can be found by decomposition
or name stripping, as well a bunch I don't know anything about.
-- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 09:01:18 EST