RE: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 25 2007 - 18:56:33 CST

  • Next message: John H. Jenkins: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"

    Ruszlán asked:

    > Kenneth Whistler wrote:
    >
    > > 1. U+200D is not part of any canonical decomposition mapping,
    > > and there is 0% chance that the UTC would ever add ZWJ or
    > > ZWNJ to such mappings.
    >
    > And why not?

    Because, as Richard Wordingham pointed out:

    "In general, the effects of ZWJ and ZWNJ are optional."

    Canonical decomposition mappings are normative, required
    relations that define identity between characters and other
    sequences of characters.

    ZWJ and ZWNJ are, for the most part, hints regarding presentation
    and rendering, and have no impact on the interpretation of
    the identity of sequences.

    You are trying to use them to "enforce" shaping *and* to
    create canonical equivalences, where the UTC did not intend
    such effects and where implementers (for many years now) have
    not done so.

    As I said, there is 0% chance that the UTC is going to revisit
    such decisions and try to repurpose these characters to
    participate in canonical decomposition mappings.

    > > 2. U+00C6 has no decomposition mapping now, and by normalization
    > > stability guarantees, none can be added.
    > >
    > > Please study:
    > >
    > > http://www.unicode.org/standard/stability_policy.html
    > >
    > > and in particular, item 3a under Decomposition Mapping in
    > > the Normalization Stability Policy.
    >
    > Hmmm... ok, though I cannot quite see the rationale behind such
    > restrictive policy.
    > Why can't decomposition mappings be version-specific?

    Because then the status of strings as normalized or not
    would also be version-specific. A string stored by a Unicode 4.0
    application might turn out not to be normalized when read
    by a Unicode 5.0 application. That would be a completely
    unacceptable outcome for nearly all of the implementers
    out there -- and in particular for databases and all
    internet applications.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 18:57:44 CST