From: Jungshik Shin (jshin@mailaps.org)
Date: Fri Dec 19 2003 - 08:10:31 EST
On Fri, 19 Dec 2003 jon@hackcraft.net wrote:
> Quoting Hallvard B Furuseth <h.b.furuseth@usit.uio.no>:
>
> > I need a function which converts Latin Unicode characters to the closest
> > equivalent ASCII characters, e.g. "é" -> "e".
> 1. Produce the NFD normalisation of the text.
> 2. Remove all characters with a non-zero combining class.
> 3. Some non-ASCII characters may remain (particularly those from non-Latin
> scripts) handling of some can be done nicely, but some may require you to
> raise an exception or output a replacement character.
> on your application. Specialised handling of some characters is possible, for
> instance you could convert the trademark sign to "(TM)" to avoid confusion,
For Korean syllables (U+AC00 - U+Dxxx), you can use 'Hangul Syllable
Short Names' that can be algorithmically derived with small tables.
This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 09:00:12 EST