From: Mark Davis (mark.davis@icu-project.org)
Date: Fri Jun 01 2007 - 11:16:44 CDT
On 5/31/07, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
>
> I see a strange sentence in the specification of the new "explicit"
> character fallback substitutions, specified in CLDR 1.5 beta
> "characters.xml" supplementary file. It says:
>
> "The recommended usage is that when a character value is not in the
> desired
> repertoire, the explicit substitutes from characters.xml are tested one by
> one against the repertoire, with the first substitute wholly in the
> repertoire being substituted for the value in the output. If no explicit
> substitute is found, then toNFC(value) is tried; if that fails then
> toNFKC(value) is tried."
>
> This definition seems to violate the current Unicode 5.0 rules, because
> explicit fallbacks (not canonically equivalent) would take precedence over
> NFC equivalents...
>
> Such definition would mean that renderers need to be changed to try
> fallbacks BEFORE converting the string to NFC, and this complicates
> significantly the implementation.
>
> I've looked at the current list of fallbacks, and in fact there is
> currently
> NO case where an explicit fallback comes along with a NFC fallback.
>
> The only significant change in those fallbacks is that there are now
> better
> fallbacks than NFKC compatibility equivalents (for example numerical
> fractions have an explicit fallback with a SPACE prior to the NFKC
> equivalent, making a better work for texts like "3<ONE HALF FRACTION>"
> which
> would fallback to "31/2" using NFKC, instead of the better "3 1/2" with
> the
> explicit fallback.
>
> So shouldn't this definition read as:
>
> "The recommended usage is that when a character value is not in the
> desired
> repertoire, then toNFC(value) is tried. If no NFC substitute is found,
> then
> the explicit substitutes from characters.xml are tested one by one against
> the repertoire, with the first substitute wholly in the repertoire being
> substituted for the value in the output; if that fails then toNFKC(value)
> is
> tried."
>
> Are you making this new definition for possible future fallbacks where it
> would be better to use another newer fallback than the current NFC
> substitutes (that can't be changed due to NFC stability)? If so, there's a
> need to change some of the requirements for Unicode 5.0 conformance
> (because
> this affects the character identity and the semantics), or the proposed
> new
> order should be just optional.
>
> For now, I see no justification (after looking at the proposed list) to
> change the order of resolution in a way that prefers breaking the
> canonical
> equivalence...
While this is not a matter of Unicode 5.0 conformance, it is a good
suggestion. Can you file as a bug?
Mark
This archive was generated by hypermail 2.1.5 : Fri Jun 01 2007 - 11:18:37 CDT