Re: Normalization: NFD and fi (U+FB01) ligature

From: Addison Phillips (
Date: Fri Jun 29 2007 - 12:24:03 CDT

  • Next message: Hans Aberg: "Re: Demande d'aide pour écrire avec la Feera (écriture Afar)"

    Atif Gulzar wrote:
    > In reference to figure 6 (attached with message) in "Unicode Standard
    > Annex #15 Unicode Normalization Forms", the fi ligature (U+FB01) is
    > not decomposed for NFD (Normalization form D). I am just confused why
    > it is not decomposed to "f" and "i". Is it not difficult for search
    > algorithms to search all the words containing f (U+0066) if the data
    > is stored in NFD?

    Because U+FB01 is a compatibility character with a compatibility
    decomposition (not a canonical decomposition). If you use NFKD you get
    the behavior you want.

    FB01;LATIN SMALL LIGATURE FI;Ll;0;L;<compat> 0066 0069;;;;N;;;;;


    Addison Phillips
    Globalization Architect -- Yahoo! Inc.
    Chair -- W3C Internationalization Core WG
    Internationalization is an architecture.
    It is not a feature.

    This archive was generated by hypermail 2.1.5 : Fri Jun 29 2007 - 12:27:42 CDT