Re: Looking for a C library that converts UTF-8 strings from their decomposed to pre-composed form

From: Deborah Goldsmith (goldsmit@apple.com)
Date: Mon Nov 08 2004 - 20:57:04 CST

  • Next message: Joe: "RE: About Encoding Theory (was: Re: Again not about Phoenician)"

    I think he's saying he wants to convert to NFC *from* Mac OS X data, in
    which case the fact that Mac OS X's file system normalization is not
    strict NFD doesn't really matter. Also, he says he's running on
    Solaris, which would make it a tad difficult to call a Mac OS X API.
    ICU should do the trick.

    It's worth pointing out that there is no such thing as "precomposed
    Unicode". Normalization form C (NFC) could be called "as precomposed as
    possible." There are some sequences of Unicode that can only be
    expressed using combining marks.

    Deborah Goldsmith
    Internationalization, Unicode liaison
    Apple Computer, Inc.
    goldsmit@apple.com

    On Nov 8, 2004, at 5:17 PM, Markus Scherer wrote:

    > Tay, William wrote:
    >> Is there any C library available that converts the decomposed UTF-8
    >> byte
    >> streams into the pre-composed equivalent?
    >
    > MacOS X does decompose filenames, but it does not use standard Unicode
    > normalization (because it was
    > designed before Unicode's normalization was finalized.) I suggest you
    > search the mailing list
    > archive for this list for more details. You probably need to use a
    > MacOS system function.
    >
    > ICU has options for normalization (some defined with internal
    > constants only) which may or may not
    > match, or get close to, MacOS filename normalization:
    > http://oss.software.ibm.com/cgi-bin/icu/nbrowser
    >
    > markus
    >



    This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 20:58:33 CST