Re: More Permanent Faults? - Unicode 5.0 Casefolding

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Jun 10 2006 - 05:57:57 CDT

  • Next message: Richard Wordingham: "Tentative Definition of Casefolding"

    Mark Davis wrote on Saturday, June 10, 2006 at 1:10 AM

    > C9 basically says that you should respect canonical equivalence, and you
    > should be prepared for any other process to respect it. In the standard we
    > supply case folding operations that do not, in themselves, require
    > normalization, but in edge cases may not respect canonical equivalence.
    > While we strongly encourage that all processing respect canonical
    > equivalence, but recognize that for some common tasks like case folding,
    > people may not want to take on the extra performance / code-complicating
    > of
    > adding normalization, to handle a small number of edge cases. But we also
    > define forms of case folding that do, in fact, respect canonical
    > equivalence.

    The problem then comes with conformace requirement C20:

    C20 An implementation that purports to support the default casing operations
    of case conversion, case detection, and caseless mapping shall do so in
    accordance with the definitions and specifications in Section 3.13, Default
    Case Operations.

     It seems then that the default uppercasings of <U+1FB3, U+0342> and
    <U+1FB3, U+0304> are <U+1FBC, U+0342> and <U+1FBC, U+0304> and their default
    casefoldings are <U+03B1, U+03B9, U+0342> and <U+03B1, U+03B9, U+0304>. Is
    this the case? If so, does C20 override C9? May correct processes offering
    'default full uppercasing (or casefolding) as defined by Unicode Version
    x.y' produce canonically inequivalent outputs?

    The issues are entirely restricted to trying to implement the default casing
    functions. Producing tailored casing functions is a different issue.

    The urgency arises from the imminent partial freezing of default
    casefolding.

    If the Unicode handling of Greek is to be improved, it may well require
    locale-sensitive rules. It may be as well to declare locales as inherently
    unstable - if Unicode lasts for centuries, they will be.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sat Jun 10 2006 - 06:06:00 CDT