From: Richard Ishida (ishida@w3.org)
Date: Thu Feb 12 2009 - 09:12:09 CST
Michael,
You can look at my source code for normalization in PHP or JavaScript at
http://rishida.net/blog/?p=222 , if that's any help.
RI
============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
http://www.w3.org/International/
http://rishida.net/
> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
> On Behalf Of Michael D. Adams
> Sent: 12 February 2009 02:33
> To: unicode@unicode.org
> Subject: Normalization Implementation Tricks
>
> How do people efficiently implement the (re-)composition table used by
> the normalization algorithm for NFC and NFKD? (I am writting a
> library for a project.)
>
> The most naive implementation would be a table indexed by a starter
> character and a combining character. Of course that is completely
> unreasonable as it would require 0x110000 * 0x110000 entries (a few
> terabytes).
>
> If I understand right, ICU library uses shared tries (as the Unicode
> spec suggests) indexed by the starter character that point to lists of
> combining character and result pairs (an association list in
> Lisp/Scheme terminology). This should reduce the size requirements,
> but now there a list we have to scan through which could increase
> run-time access cost.
>
> Are there any other implementation methods that have a small memory
> footprint (~10-20kb) and quick access (~ 10-20 instructions)? Any
> guidance in this regard would be appriciated.
>
> Michael D. Adams
> mdmkolbe@gmail.com
This archive was generated by hypermail 2.1.5 : Thu Feb 12 2009 - 09:16:56 CST