Re: Simple case mapping and normalization

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 20 2009 - 17:35:17 CST

Next message: announcements@unicode.org: "[Unicode Announcement] New Public Review Issues for UAXes"

Previous message: vanisaac@boil.afraid.org: "Another new draft for Chinook script proposal- version 3.3"
Maybe in reply to: Tim Greenwood: "Simple case mapping and normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tim Greenwood asked:

> I see from section 5.18 of TUS that normalization, and in particular
> NFC is not guaranteed to be preserved after a case conversion
> operation. Is this still true if only the simple uppercase or simple
> lowercase from UnicodeData.txt is used instead of the full case
> mapping?

Yes.

> If not can someone provide an example.

I assume you mean "If so..."

Here is an example:

<004A, 030C> is in NFC. (Capital letter J with a combining hacek)

If I use a simple lowercase transform on that I get:

<006A, 030C> That is not in NFC, because NFC(<006A, 030C>) --> 01F0

I think you can also get into trouble with combining marks
applied to capital Greek rho (cf. 1FE4) and possibly the
Latin long-s, too.

In other words, there are a few edge cases where there are
not completely symmetrical one-to-one simple case mappings between
some precomposed characters with diacritics.

--Ken

Next message: announcements@unicode.org: "[Unicode Announcement] New Public Review Issues for UAXes"
Previous message: vanisaac@boil.afraid.org: "Another new draft for Chinook script proposal- version 3.3"
Maybe in reply to: Tim Greenwood: "Simple case mapping and normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Feb 20 2009 - 17:39:23 CST