Re: Normalization: NFD and fi (U+FB01) ligature

From: Addison Phillips (addison@yahoo-inc.com)
Date: Fri Jun 29 2007 - 12:24:03 CDT

Next message: Hans Aberg: "Re: Demande d'aide pour �crire avec la Feera (�criture Afar)"

Previous message: Atif Gulzar: "Normalization: NFD and fi (U+FB01) ligature"
In reply to: Atif Gulzar: "Normalization: NFD and fi (U+FB01) ligature"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Atif Gulzar wrote:
>
> In reference to figure 6 (attached with message) in "Unicode Standard
> Annex #15 Unicode Normalization Forms", the fi ligature (U+FB01) is
> not decomposed for NFD (Normalization form D). I am just confused why
> it is not decomposed to "f" and "i". Is it not difficult for search
> algorithms to search all the words containing f (U+0066) if the data
> is stored in NFD?
>

Because U+FB01 is a compatibility character with a compatibility
decomposition (not a canonical decomposition). If you use NFKD you get
the behavior you want.

FB01;LATIN SMALL LIGATURE FI;Ll;0;L;<compat> 0066 0069;;;;N;;;;;

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG
Internationalization is an architecture.
It is not a feature.

Next message: Hans Aberg: "Re: Demande d'aide pour �crire avec la Feera (�criture Afar)"
Previous message: Atif Gulzar: "Normalization: NFD and fi (U+FB01) ligature"
In reply to: Atif Gulzar: "Normalization: NFD and fi (U+FB01) ligature"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jun 29 2007 - 12:27:42 CDT