Normalization and the sample code

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Wed Jun 13 2001 - 18:14:38 EDT


All,

I have been playing with the sample code for normalization in UAX15 and the ICU4J classes that are, shall we say, "closely related" to the sample code.

If I ask for NKFC of the string U+0060 or U+005F (or of U+FF40 and U+FF3F, which are the wide equivalents and the initial source of my woes), I get the sequence U+0020 U+0300 (or U+0020 U+0332). The wording of the UAX implies that this is the "correct" behavior, as long as you don't consider the non-spacing marks to be a "combination" of space and the non-spacing version of the character.

The conformance test file says that FF40 and FF3F should become 0060 and 005F, but nothing about 0060 and 005F ultimately. Neither does it handle 0020 + 03xx in any way.

So, what's right?

1. I should get the sequence I get; or
2. There is a bug in the code; or
3. There is an omission in the tables.

Best Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone) +1 408.210.3659 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT