From: Addison Phillips (addison@yahoo-inc.com)
Date: Fri Jun 29 2007 - 12:24:03 CDT
Atif Gulzar wrote:
>
> In reference to figure 6 (attached with message) in "Unicode Standard
> Annex #15 Unicode Normalization Forms", the fi ligature (U+FB01) is
> not decomposed for NFD (Normalization form D). I am just confused why
> it is not decomposed to "f" and "i". Is it not difficult for search
> algorithms to search all the words containing f (U+0066) if the data
> is stored in NFD?
>
Because U+FB01 is a compatibility character with a compatibility
decomposition (not a canonical decomposition). If you use NFKD you get
the behavior you want.
FB01;LATIN SMALL LIGATURE FI;Ll;0;L;<compat> 0066 0069;;;;N;;;;;
Addison
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair -- W3C Internationalization Core WG Internationalization is an architecture. It is not a feature.
This archive was generated by hypermail 2.1.5 : Fri Jun 29 2007 - 12:27:42 CDT