Re: Plane 14 redux

From: Doug Ewell (dewell@compuserve.com)
Date: Wed Sep 06 2000 - 10:12:42 EDT


Kenneth Whistler <kenw@sybase.com> wrote:

> However, there is great benefit in making a very strong recommendation
> about the content of language tags -- and making it in the context
> of the Unicode Standard itself, rather than someplace else. Tying
> them to RFC 1766 (or its successor) makes it possible to actually
> use them and expect a general parser to be buildable.

I can't argue with that. It might help, though, to mention the "or its
successor" part explicitly in any future revision or UAX, just as the
draft of the successor refers to language and country codes found in
the ISO standards "or subsequently made by the standard's registration
authority."

> More important,
> however, in my mind, is the precedent it sets for the *NON*-use of
> tag characters, or rather the *NON*-misuse of tag characters. Most
> of us, including those of use culpable in the definition of the
> tag characters (which John Cowan pointed out were defined to head
> off a worse threat to UTF-8) would prefer not to see them in
> wide use, but rather the use of standard tagging mechanisms like
> XML or HTML.

Wow. You too.

I honestly had no idea that the use of Plane 14 language tags, defined
as they are in a Unicode Technical Report, were so strongly deprecated
by everyone "in the know" about Unicode, including their own creators.
I had read UTF #7 at face value, as describing an optional mechanism
that might help with certain processes but which we were under no
obligation to use, but now it appears that Plane 14 language tags have
the RFC 1815 nature ("Here's something you can use, but for God's sake,
please don't use it").

I really do see a fair amount of potential in the ability to specify
the language of plain Unicode text in an all-Unicode way. It may seem
hard to believe sometimes, but there is still a lot of plain text out
there among all the HTML, XML, PDF files, Word documents, and other
fancy text that gets all the attention today. (I feel a little like
Frank da Cruz, arguing that in this age of Web browsers there are still
a lot of terminals and emulators out there.)

I have suggested on this list using Plane 14 tags to assist in glyph
selection between C, J, and K or between Russian italics and Serbian
italics because I thought they would provide a nice, all-Unicode
solution *without* resorting to higher protocols. Other Unicode
mechanisms, like LTR and RTL directional overrides and ligation control
via ZWJ and ZWNJ (to name only two), seem to have been invented for
exactly that purpose. However, if the experts really don't want Plane
14 tags in general use, then I guess I should stop trying to promote
them (but that will *not* make the need go away, and it will not result
in the automatic conversion of all language-sensitive plain text to a
fancy-text format). Perhaps the wording in UTF #7 or its successor can
be strengthened to show that these tags are discouraged in favor of
HTML or XML, so people like me won't be misled.

> Keeping the defined use of language tag characters
> "in house" in the UTC makes it more difficult for arbitrary other
> organizations to start proliferating usages of Plane 14 tag
> characters that we would rather prefer not to see happen.

I am confused by this. Doesn't the UTC determine what Plane 14 tag
characters will be created? How can "arbitrary other organizations"
create their own tags contrary to the intent of the UTC? I have the
feeling this is not what was meant, but I need a clarification.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT