On Wed, 16 Jul 1997, Glenn Adams wrote:
> At 12:42 PM 7/15/97 -0700, Markus G. Kuhn wrote:
> >Many systems have already their own language tagging mechanism and do
> >not need an additional one from Unicode. For instance, in HTML 4.0
> ><http://www.w3.org/TR/WD-html40/>, you can write things like
>
> While HTML and other application conventions solve the language tagging
> problem in particular domains, they do not do so in a way that satisfies
> this requirement in plain text domains. The proposed mechanism does not
> conflict with the HTML mechanism; indeed, the HTML mechanism would be
> preferred in that context. Note that a similar issue arises with respect
> to bidirectional overrides and embedding levels. 10646 encodes these
> directly as their absence would preclude minimum legibility of many bidi
> texts in the plain text context. However, when using a richer representation,
> like HTML, these should generally be replaced with markup at the higher
> level. Langauge information can be handled similarly.
There are some similarities between the BIDI "control" characters and
the language tags, but also some important differences:
- The need for BIDI information turns up as soon as you cannot
guarantee the width of a displayed text anymore. It assures
that words are given the right sequence within a line,
and is therefore rather crucial for basic readability.
Language information has various applications, but they
are all related to much more sophisticated operations
than variable-width formatting, and basic readability
is not an issue.
- Not surprisingly, BIDI codes have a long tradition in plain
text, and HTML markup has been modelled after this
tradition and existing standards. Language tags, also
not surprising, don't have much of a tradition in
plain text, and the proposals currently discussed are
modelled after marked-up text. Please don't say that
this is due to Unicode; if language tags were that
seriously necessary, they would already have been
introduced for iso-8859-1 and many other "charset"s.
- In RFC 2070 and in HTML 4.0, BIDI "control" codes are allowed
in parallel with HTML BIDI markup, but their use is
highly discouraged because it's very difficult to
keep both variants in sync, and to distinguish between
bidi information for the markup and for the final text
when editing raw HTML. Tolerance for BIDI "control" codes
was added at a rather late stage on a request from an
Israeli specialist who worried about the ease of
including existing plain text into HTML.
With respect to the currently discussed "plain-text
language tags", there is neither a need nor a plan
to allow them in HTML. HTML has its mechanism providing
language information, and other formats can choose
between conversion or failure.
Regards, Martin.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT