Re: The result of the plane 14 tag characters review.

From: George W Gerrity (
Date: Wed Nov 13 2002 - 02:07:29 EST

  • Next message: Doug Ewell: "Re: The result of the plane 14 tag characters review."

    I have been watching this thread for some time now, and Doug Newell's
    comments have prompted me to add my two cent's worth.

    In an effort to unify all character and pictographs, the decision was
    made to unify CJK characters by suppressing most variant forms. That
    turns out to be the single greatest objection from users --
    especially Japanese -- and somehow we need a low-level way of
    indicating the target language in the context of multilingual text.

    The plane 14 tags seem to be appropriate to do this, giving a hint to
    the font engine as to a good choice of alternate glyphs, where

    The problems occur first, because the code scanner can no longer be
    stateless; second, because one needs to provide an over-ride to
    higher-level layout engines; third, because it can't solve problems
    where multiple glyphs exist, whose use is highly context-dependent,
    as is the case for some Japanese texts; and fourth, because there is
    no one-one translation between the (largely) non-unified simplified
    and traditional characters in Chinese.

    It seems to me that the Unicode people should bite the bullet that
    where the unification process creates problems, a solution needs to
    be provided. The use of the language tags should be able to deal with
    most objections to rendering in a given language, _provided_
    direction is given as to how the use of plane 14 tags should behave
    (I say, as a hint for glyph choice), and how the rendering engine
    should communicate with higher-order text processing.

    Note that I am _not_ advocating the use of such tags to describe font
    _styles_ although when dealing with long s, for instance, the
    boundary is fuzzy.

    To suggest that such fundamental glyph choices as linguistic
    preference should be left to high-level markup in text-processing
    applications, without providing a unified way to do it, seems to
    violate the spirit of Unicode.


    >Kenneth Whistler <kenw at sybase dot com> wrote:
    >> The Unicode Technical Committee would like to announce that no
    >> formal decision has been taken regarding the deprecation of
    >> Plane 14 language tag characters. The period for public review of
    >> this issue will be extended until February 14, 2003.
    >Gee, a press conference after all. Too bad my TV was turned off.
    >No, seriously, thanks for the update. I'm glad to see the matter was
    >considered worthy of further study. Hopefully other people who have an
    >opinion on Plane 14 will contribute to the public review.
    >Ken also wrote:
    >> Doug's contribution would be
    >> more convincing if it dropped away the irrelevancies about whether
    >> the *function* of language tagging is useful and focussed completely
    >> on the appropriateness of this *particular* set of characters on
    >> Plane 14 as opposed to any other means of conveying the same
    >> distinctions.
    >That's why I included a "severability" clause, to the effect that if one
    >of my arguments was bogus (or irrelevant) it shouldn't affect the
    >credibility of the others.
    >To answer the question "why Plane 14 plain-text instead of markup," I
    >suppose I need to make the case that this meta-information is sometimes
    >appropriate in short strings and labels where rich text is overkill.
    >This was basically the argument put forth by the ACAP people. I did
    >some homework on the MLSF proposal (a little late, I know) and saw that
    >their primary perceived need was for tagging short strings in protocols
    >which did not lend themselves to an additional rich-text layer.
    >After seeing the MLSF tagging scheme, I agree more than ever that its
    >deployment would have jeopardized the usefulness of UTF-8. Although the
    >number of proposals like this to "extend" or "enhance" UTF-8 has
    >diminished greatly since then, it would be a shame to see them resurface
    >on the basis that "Unicode doesn't provide us any alternative."
    >To me, the most difficult part of the "Save Plane 14" campaign seems to
    >be convincing people that not every text problem lends itself to a
    >markup solution. Without questioning the current and future importance
    >of HTML and XML, there *is* text in the world that is not wrapped in one
    >of these formats, and cannot be reasonably converted to them, yet still
    >needs to be processed in some way.
    >Judging from the discussion on the list last week, there also seems to
    >be a perception that Plane 14 tags require a great deal of overhead,
    >even to ignore them. I'd like to continue that discussion (especially
    >since the public-review period has been extended) and ask:
    >1. What extra processing is necessary to interpret Plane 14 tags that
    >wouldn't be necessary to interpret any other form of tags?
    >2. What extra processing is necessary to ignore Plane 14 tags that
    >wouldn't be necessary to ignore any other Unicode character(s)?
    >3. Is there any method of tagging, anywhere, that is lighter-weight
    >than Plane 14? (Corollary: Is "lightweight" important?)
    >-Doug Ewell
    > Fullerton, California

    Dr George W Gerrity    Phone:  +61 2 6386 3431
    GWG Associates         Fax:    +61 2 6386 3431
    P O Box 229            Time:   +10 hours (ref GMT)
    Harden, NSW 2587       PGP RSA Public Key Fingerprint:
    AUSTRALIA                      73EF 318A DFF5 EB8A 6810 49AC 0763 AF07

    This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 01:21:58 EST