Re: Plane 14 language tags

From: Mark E. Davis (markdavis@ispchannel.com)
Date: Wed Jan 26 2000 - 09:12:02 EST


> With plain otherwise untagged text there are many cases where
> you can't know how to sensibly render the text (even if you know
> the what the basic script is) without also knowing what the language is.
>

I disagree. In the absence of any other information, in the vast majority of
cases if you render plain text in the default fonts chosen by the user for the
his/er computer, you will get perfectly acceptable results. (Of course, to be
"acceptable" you have have some choice of fonts on the machine -- but if you
don't have any choice, typically you couldn't do any better if the language were
tagged.) These fonts may be derived from the default language on the user's
machine (or may be simply chosen independently). For really precise results,
what people actually want is font tagging, not language tagging.

Thus at most, this could be reworded to:

"With plain otherwise untagged text there are some cases where
you can't know how to optimally render the text (even if you know
the what the basic script is) without also knowing what the language is."

Even this way, the "some" may convey too strong an impression. No one has ever
come forward with actual cases where the above approach is insufficient.

Mark

Christopher John Fynn wrote:

> Martin
>
> I was thinking of plain, otherwise untagged, text. Of course if you are
> using HTML, XML etc. you should *only* be using the language tagging
> mechanisms available and supported in those standards.
>
> With plain otherwise untagged text there are many cases where
> you can't know how to sensibly render the text (even if you know
> the what the basic script is) without also knowing what the language
> is. In those cases at least, I think there is a strong argument that there
> should be some way of indicating this within the plain text standard
> itself. These plane 14 characters provide a means of doing this.
>
> There is nothing which states that if HTML and XML applications
> use ISO10646 character encoding that they must recognise and
> deal with these language tag characters - only that they should
> accept them. Such applications could simply display some kind of
> placeholder glyph - or not display them at all. If I were writing an
> XML or HTML editor I would probably include the option of
> converting such characters to equivalent XML or HTML tags
> when importing plain text, and have the option of trying to
> convert XML or HTML language tags to these characters when
> exporting to a plain Unicode text file without other mark-up.
>
> The other thing I was (wryly) observing is that once
> you put some sort of language tags in text - and make use of
> them - then, for about the same amount of work ,you could be using
> language tags (not necessarily of the same sort) to switch
> between standard encodings for those scripts - something
> most people using Unicode are trying to get away from.
> This observation of course naively assumes a standard
> encoding for each script. I wasn't trying to suggest
> for a moment anyone should actually do this - simply
> trying to point out that in many cases you can't get
> away from specifying language as well as script system
> even if you want to. Script systems are of course
> more or less "tagged" simply by the code range of
> the characters.
>
> - Chris
>
> BTW shouldn't we be speaking of "Plane 0E language tags" instead of
> "Plane 14" since hexadecimal notation is normative in the ISO 10646
> and Unicode Standards? ISO 10646 has two hundred and fifty six
> planes (00 to FF) so there is another "plane 14" (hex). [For similar
> reasons I think the use of hexadecimal character entities rather
> than decimal character entities should be encouraged in XML & HTML]
>
> ----- Original Message -----
> From: Martin J. Duerst <duerst@w3.org>
> To: Unicode List <unicode@unicode.org>
> Cc: Unicode List <unicode@unicode.org>
> Sent: Wednesday, January 26, 2000 7:11 AM
> Subject: Re: Plane 14 language tags
>
> > At 09:11 00/01/19 -0800, Christopher John Fynn wrote:
> > > One reason it might be a good idea to encourage the use of plane 14
> language
> > > tags rather than tagging schemes outside Unicode is that once you have
> > > external tags it becomes almost as easy to support a set of separate
> > > encoding standards for individual languages as it is to support Unicode.
> >
> > Chris - I think this is completely wrong. Please have a look at HTML
> > and XML. Using mixed encodings in those cases would be a true nightmare.
> >
> > Regards, Martin.
> >
> >
> > #-#-# Martin J. Du"rst, World Wide Web Consortium
> > #-#-# mailto:duerst@w3.org http://www.w3.org
> >



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT