From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 20 2009 - 17:35:17 CST
Tim Greenwood asked:
> I see from section 5.18 of TUS that normalization, and in particular
> NFC is not guaranteed to be preserved after a case conversion
> operation. Is this still true if only the simple uppercase or simple
> lowercase from UnicodeData.txt is used instead of the full case
> mapping?
Yes.
> If not can someone provide an example.
I assume you mean "If so..."
Here is an example:
<004A, 030C> is in NFC. (Capital letter J with a combining hacek)
If I use a simple lowercase transform on that I get:
<006A, 030C> That is not in NFC, because NFC(<006A, 030C>) --> 01F0
I think you can also get into trouble with combining marks
applied to capital Greek rho (cf. 1FE4) and possibly the
Latin long-s, too.
In other words, there are a few edge cases where there are
not completely symmetrical one-to-one simple case mappings between
some precomposed characters with diacritics.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Feb 20 2009 - 17:39:23 CST