Re: In defense of Plane 14 language tags (long)

From: Thomas M. Widmann (thomas@widmann.uklinux.net)
Date: Mon Nov 04 2002 - 17:25:18 EST

  • Next message: John Cowan: "Re: [OT] Re: `` ", ` '"

    "Doug Ewell" <dewell@adelphia.net> writes:

    > OK, so my "mini-essay" against deprecating the Plan 14 language tags
    > didn't turn out quite so "mini" after all.

    It was very interesting.

    > [...]
    > Other scripts besides Han can benefit from plain-text language tagging
    > as well. A common Latin-script example is that acute accents over
    > Polish letters have a noticeably steeper slant than they do over (e.g.)
    > French letters. Fonts are not usually designed to put a steeper mark
    > over a letter like ź used in Polish and a more horizontal mark over a
    > letter like á used in French. Language tags can be used in conjunction
    > with display engines to indicate a preference between alternative glyphs
    > in such cases.

    One might add many other examples, e.g.:

    Let's assume I like to read my e-mail in a fraktur font using
    superscript 'e' for the umlaut, but -- in line with traditional
    typesetting -- to use non-fraktur with a normal dieresis for words in
    Romance languages. How is this supposed to work if my email program
    doesn't know the languages involved? (This does not necessarily mean
    that people should send me language-tagged emails -- I might in
    principle have a small preprocessor which language-tags all incoming
    emails.)

    Or what about Coptic? Unicode encodes most Coptic letters as Greek,
    which means that the same font cannot be used for displaying Greek and
    Coptic. (TUC 3.0, p. 168: "Texts that mix Greek and Coptic languages
    together must employ appropriate font style associations.") How is
    this supposed to work if one doesn't know what is Greek and what is
    Coptic?

    And how can you uppercase 'i' if you don't know whether it's a
    occurring in a Turkish word or not?

    Finally, I believe proper kerning is language-dependent, and so
    impossible without knowledge of the languages used.

    Proponents of deprecating language tags probably assume that plain
    text isn't much used and that higher-level protocols can therefore
    nearly always be used, but that is not the case in my experience:
    plain text is still widely used.

    /Thomas

    PS: I'm new to this list, but once upon a time when I was studying
    Georgian in Tbilisi (some six years ago), I participated on the
    ISO-10646 mailing list which was later discontinued. I have a
    combined degree in linguistics and computer science and work in
    dictionary publishing.

    -- 
    Thomas Widmann, MA           Mavisbank Gardens, Glasgow, Scotland, EU
    thomas@widmann.uklinux.net             http://www.widmann.uklinux.net
    


    This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 18:23:43 EST