Re: outside decomposed, inside precomposed

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Oct 13 2004 - 12:30:29 CST

  • Next message: Eric Muller: "Re: outside decomposed, inside precomposed"

    > Jon Hanna wrote:
    >
    > >>imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get
    > >>remapped
    > >>internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
    > >>
    > >>Is this kind of behavior what one would expect?
    > >>
    > >>
    > >
    > >That's conformant, if it causes problems with any other process (including
    > >other processes that are part of the system in question) then that other
    > >process isn't complying with conformance clause C9.
    > >
    > >

    And Eric Muller asked:
     
    > But what if U+1ebd is not part of the repertoire supported by that other
    > process?

    Ah, but there is "support" and then there is "support".

    A conformant implementation can pick and choose the repertoire is
    supports for some text processes, e.g. for display. No font is
    required to support display of *all* Unicode characters, and
    that could perfectly well apply to U+1EBD.

    However, implementations don't get to pick and choose so easily
    about aspects of the standard such as encoding forms and normalization.
    You can't, for example, recognize that <U+006E, U+0303> is canonically
    equivalent to U+00F1 (ñ), but claim *not* to recognize that
    <U+0065, U+0303> is likewise canonically equivalent to U+1EBD, simply
    because U+1EBD is not in a range that your implementation chooses
    to "interpret" for display. Such, broken, partial recognitions of
    canonical equivalence would represent non-conformant implementations
    of normalization. That is also why most implementations should depend
    on library code for normalization, where the library code specifically
    claims to be a conformant implementation of normalization -- and
    handles *all* Unicode characters correctly.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Oct 13 2004 - 12:32:55 CST