From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu May 31 2007 - 18:18:37 CDT
I see a strange sentence in the specification of the new "explicit"
character fallback substitutions, specified in CLDR 1.5 beta
"characters.xml" supplementary file. It says:
"The recommended usage is that when a character value is not in the desired
repertoire, the explicit substitutes from characters.xml are tested one by
one against the repertoire, with the first substitute wholly in the
repertoire being substituted for the value in the output. If no explicit
substitute is found, then toNFC(value) is tried; if that fails then
toNFKC(value) is tried."
This definition seems to violate the current Unicode 5.0 rules, because
explicit fallbacks (not canonically equivalent) would take precedence over
NFC equivalents...
Such definition would mean that renderers need to be changed to try
fallbacks BEFORE converting the string to NFC, and this complicates
significantly the implementation.
I've looked at the current list of fallbacks, and in fact there is currently
NO case where an explicit fallback comes along with a NFC fallback.
The only significant change in those fallbacks is that there are now better
fallbacks than NFKC compatibility equivalents (for example numerical
fractions have an explicit fallback with a SPACE prior to the NFKC
equivalent, making a better work for texts like "3<ONE HALF FRACTION>" which
would fallback to "31/2" using NFKC, instead of the better "3 1/2" with the
explicit fallback.
So shouldn't this definition read as:
"The recommended usage is that when a character value is not in the desired
repertoire, then toNFC(value) is tried. If no NFC substitute is found, then
the explicit substitutes from characters.xml are tested one by one against
the repertoire, with the first substitute wholly in the repertoire being
substituted for the value in the output; if that fails then toNFKC(value) is
tried."
Are you making this new definition for possible future fallbacks where it
would be better to use another newer fallback than the current NFC
substitutes (that can't be changed due to NFC stability)? If so, there's a
need to change some of the requirements for Unicode 5.0 conformance (because
this affects the character identity and the semantics), or the proposed new
order should be just optional.
For now, I see no justification (after looking at the proposed list) to
change the order of resolution in a way that prefers breaking the canonical
equivalence...
--- I also see that the data currently proposes the string "PHP" (ISO currency code for the Philippan Peso) as an explicit fallback for the PESO SIGN, but I'm not sure that the PESO SIGN is restricted to the Philippines. I think that the "Ps" fallback would be more appropriate. Same thing for the WON SYMBOL that uses the explicit fallback "KRW" and assumes the South Korean currency, when the WON SYMBOL is also used for the North Korean Won (KRP)... Here also, I think that the "W/" fallback would be more appropriate... For such currency symbol substitutes, which are locale-dependant, may be these would be localizable using CLDR locale data if they must be kept. Philippe.
This archive was generated by hypermail 2.1.5 : Thu May 31 2007 - 18:20:35 CDT