Re: Canonical equivalence in rendering: mandatory or recommended?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Oct 15 2003 - 11:15:23 CST


Jill Ramonsky wrote:
> I had to write an API for my employer last year to handle some aspects
> of Unicode. We normalised everything to NFD, not NFC (but that's easier,
> not harder). Nonetheless, all the string handling routines were not
> allowed to assume that the input was in NFD, but they had to guarantee
> that the output was. These routines, therefore, had to do a "convert to
> NFD" on every input, even if the input were already in NFD. This did
> have a significant performance hit, since we were handling (Unicode)
> strings throughout the app.

Note that, in addition to "is normalized" flags, it is much faster to check whether a string is
normalized, and to normalize it only if it's not. This at least if there is a good chance that the
string is normalized - as appears to be true in your application, and is usually true where most
other applications check for NFC on input. See UAX #15 for details. ICU has quick check and
normalization functions.

markus



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST