From: Thomas M. Widmann (thomas@widmann.uklinux.net)
Date: Mon Nov 04 2002 - 17:25:18 EST
"Doug Ewell" <dewell@adelphia.net> writes:
> OK, so my "mini-essay" against deprecating the Plan 14 language tags
> didn't turn out quite so "mini" after all.
It was very interesting.
> [...]
> Other scripts besides Han can benefit from plain-text language tagging
> as well. A common Latin-script example is that acute accents over
> Polish letters have a noticeably steeper slant than they do over (e.g.)
> French letters. Fonts are not usually designed to put a steeper mark
> over a letter like ź used in Polish and a more horizontal mark over a
> letter like á used in French. Language tags can be used in conjunction
> with display engines to indicate a preference between alternative glyphs
> in such cases.
One might add many other examples, e.g.:
Let's assume I like to read my e-mail in a fraktur font using
superscript 'e' for the umlaut, but -- in line with traditional
typesetting -- to use non-fraktur with a normal dieresis for words in
Romance languages. How is this supposed to work if my email program
doesn't know the languages involved? (This does not necessarily mean
that people should send me language-tagged emails -- I might in
principle have a small preprocessor which language-tags all incoming
emails.)
Or what about Coptic? Unicode encodes most Coptic letters as Greek,
which means that the same font cannot be used for displaying Greek and
Coptic. (TUC 3.0, p. 168: "Texts that mix Greek and Coptic languages
together must employ appropriate font style associations.") How is
this supposed to work if one doesn't know what is Greek and what is
Coptic?
And how can you uppercase 'i' if you don't know whether it's a
occurring in a Turkish word or not?
Finally, I believe proper kerning is language-dependent, and so
impossible without knowledge of the languages used.
Proponents of deprecating language tags probably assume that plain
text isn't much used and that higher-level protocols can therefore
nearly always be used, but that is not the case in my experience:
plain text is still widely used.
/Thomas
PS: I'm new to this list, but once upon a time when I was studying
Georgian in Tbilisi (some six years ago), I participated on the
ISO-10646 mailing list which was later discontinued. I have a
combined degree in linguistics and computer science and work in
dictionary publishing.
-- Thomas Widmann, MA Mavisbank Gardens, Glasgow, Scotland, EU thomas@widmann.uklinux.net http://www.widmann.uklinux.net
This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 18:23:43 EST