Re: In defense of Plane 14 language tags (long)

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Mon Nov 04 2002 - 12:21:17 EST

Next message: Joseph Boyle: "RE: PRODUCING and DESCRIBING UTF-8 with and without BOM"

Previous message: Edward H Trager: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
In reply to: Doug Ewell: "In defense of Plane 14 language tags (long)"
Next in thread: John Hudson: "Re: In defense of Plane 14 language tags (long)"
Reply: John Hudson: "Re: In defense of Plane 14 language tags (long)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:

> 1. Language tags may be useful for display issues.
...

> For example, it is often said that Japanese
> users prefer “Japanese-style” glyphs universally, even for Chinese text.
>
> The Plane 14 tagging approach is not perfect, but it is sufficient to
> solve this problem. Japanese users who prefer “Japanese-style” glyphs
> universally can tag all Han text as “ja”, which may be linguistically
> wrong but achieves the desired effect. Users who want Chinese glyphs
> for Chinese-language text and Japanese glyphs for Japanese-language text
> can tag the former as “zh” and the latter as “ja” as they see fit.

The "user" viewing the text (and preferring 'Japanese-style' glyphs)
may be another person than the "user" authoring the text (and inserting
the plane-14 tags); in fact the user viewing the text may not be able
to modify the plane-14 tags, or may not even be aware of them.

I guess, this argument should be reworded, based on a clear distinction
of the various "users".

> Other scripts besides Han can benefit from plain-text language tagging
> as well. A common Latin-script example

...

A common Cyrillic example is the difference in the italic forms for,
e. g., Russian and Serbian, cf. "Rendering Serbian italics" (used to
be at <http://www.tiro.com/transfer/Serbian_Rendering.pdf> -- John,
can we have it back?).

Other examples include the different current (handwriting) forms,
e. g., a UK "I" is perceived as a "T" by most Germans; the Russian-
Serbian contrast mentioned above is also in current.

> 2. Language tags may be useful for non-display issues.
...

> 3. Conflict with HTML/XML tags need not be a problem.
...

> The potential disruption caused by this scenario is probably overstated.
> Almost every HTML file ever created contains at least one plain-text
> line separator (CR and/or LF) and at least one HTML-style line separator
> ( and/or ). Which to follow? The HTML specification very
> clearly states that the higher-level protocol takes precedence in this
> case (unless <pre>preformatted text</pre> is explicitly indicated). The
> same could be said for the interaction between Plane 14 language tags
> and HTML language tags.

Other possibilities include a clear rule about their mutual interaction.

Paradigms to follow are

- interaction between Unicode formatting characters, such as U+200E,
 U+200F, and U+202A through U+202E, and HTML tagging, such as
 the Dir attribute and the Bdo element (cf.
 <http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2>),

- interaction between HTTP arguments and the HTML Meta tag, e. g.,
the HTTP Content-Type, including its charset attribute,
cf. <http://www.w3.org/TR/html401/charset.html#h-5.2.2>.

> 4. The original need for language tags has not disappeared.

...

> 5. “Statefulness” disadvantage is exaggerated.
...

> 6. Plane 14 tags are easy to filter out, and harmless if not
> interpreted.

...

> Tags [...] do not affect searching,

There are indeed situations where language tags would affect searching,
if not handled properly.
Example: In my German WWW pages, I take pains to tag all English terms
in the hope to help speech synthesizers, or other clients depending on
the correct identification of the language. Now, German attaches pre-
fixes and suffixes to the word-stems, and also tends to form compounds.
Of course, I have to confine my LANG=EN span to the English word proper.
This leads to monsters such as
 E-Mail-Adresse
 Mailinglisten
 ... aus den Received-Headern ...

A search engine should remove these tags before comparing a search argument
to this sort of text. For perfect results, this normalizing should be ap-
plied to HTML tags and Unicode tags, alike. (I fear that Google is not
that smart, but I haven't tested it.)

So the correct argument for Doug's issue #6, the correct argument is
probably:
Plane-14 Tags do not affect searching any more than high-level tags do.

> 7. Rapid deprecation creates an image of instability.
...

> 8. Other, as yet uninvented tags would be implicitly deprecated.
...

Best wishes,
Otto Stolz

Next message: Joseph Boyle: "RE: PRODUCING and DESCRIBING UTF-8 with and without BOM"
Previous message: Edward H Trager: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
In reply to: Doug Ewell: "In defense of Plane 14 language tags (long)"
Next in thread: John Hudson: "Re: In defense of Plane 14 language tags (long)"
Reply: John Hudson: "Re: In defense of Plane 14 language tags (long)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 12:56:13 EST