From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Feb 06 2003 - 14:54:04 EST
Doug wrote:
> Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
>
> > Unicode 4.0 will be quite specific: P14 tags are "reserved for
> > use with particular protocols requiring their use" is what the
> > text will say more or less.
>
> I didn't know the question of what to do about Plane 14 language tags
> had already been resolved.
>
> If that is the case, it might make sense to add an explanatory note to
> the Public Review item on Plane 14 tags, or simply to remove the item.
The issue up for public review, as it states, is about
formal *deprecation* of the Plane 14 Language Tags.
The UTC already has consensus on limiting the use and contexts
of use of the language tag characters. Such language was written
into Unicode 3.1:
"The [language tag] characters... provide a mechanism for
language tagging in Unicode plain text. <emphasis>However,
the use of these characters is strongly discouraged.</emphasis>
The characters in this block are reserved for use with special
protocols. They are <emphasis>not</emphasis> to be used in
the absence of such protocols, or with <emphasis>any</emphasis>
protocols that provide alternate means for language tagging,
such as HTML or XML. The requirement for language information
embedded in palin text data is often overstated. ...
"Because of the extra implementation burden, language tags should
be avoided in plain text unless language information is required
and it is known that the receivers of the text will properly
recognize and maintain the tags...
"Language tags should also be avoided wherever higher-level
protocols, such as a rich-text format, HTML or MIME, provide
language attributes."
This language is carried forward, as with the rest of the
Unicode 3.1 and Unicode 3.2 text, into the consolidated text
of Version 4.0 of the standard.
The UTC also long ago approved UTR #20, which states that
language tags...
"...were solely included for the benefit of those Internet
protocols, such as ACAP, which require a standard mechanism
for marking language in UTF-8 strings, and at the same time
to avoid the use of other tagging schemes that relied on
specific details of the encoding form used."
So what we are talking about here is not opening up again
the wonderful world of what language tag characters are
good for, and broadening their use.
The issue on the table is:
Because the UTC has determined that the use of language
tag characters is to be strongly discouraged, and is limited
in any case to very particular protocols, should the
UTC take one step further and declare them formally
*deprecated*?
The result of the latter decision would be to add a statement
to that effect in the block description in Unicode 4.0 for
the language tag characters, and to add the code points
U+E0001, U+E0020..U+E007F to the list of code points which
get the Deprecated property in PropList.txt.
That's it. That's what is on the table for comment and
eventual decision by the UTC.
My personal opinion? The whole debate about deprecation of
language tag characters is a frivolous distraction from
other technical matters of greater import, and things would
be just fine with the current state of the documentation.
But, if formal deprecation by the UTC is what it would take
to get people to stop advocating more use of the language
tags after the UTC has long determined that their use is
strongly discouraged, then so be it.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Feb 06 2003 - 15:30:20 EST