Re: Simple case mapping and normalization

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 20 2009 - 17:35:17 CST

  • Next message: announcements@unicode.org: "[Unicode Announcement] New Public Review Issues for UAXes"

    Tim Greenwood asked:

    > I see from section 5.18 of TUS that normalization, and in particular
    > NFC is not guaranteed to be preserved after a case conversion
    > operation. Is this still true if only the simple uppercase or simple
    > lowercase from UnicodeData.txt is used instead of the full case
    > mapping?

    Yes.

    > If not can someone provide an example.

    I assume you mean "If so..."

    Here is an example:

    <004A, 030C> is in NFC. (Capital letter J with a combining hacek)

    If I use a simple lowercase transform on that I get:

    <006A, 030C> That is not in NFC, because NFC(<006A, 030C>) --> 01F0

    I think you can also get into trouble with combining marks
    applied to capital Greek rho (cf. 1FE4) and possibly the
    Latin long-s, too.

    In other words, there are a few edge cases where there are
    not completely symmetrical one-to-one simple case mappings between
    some precomposed characters with diacritics.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Feb 20 2009 - 17:39:23 CST