Re: In defense of Plane 14 language tags (long)

From: John Cowan (
Date: Tue Nov 05 2002 - 07:37:38 EST

  • Next message: Marco Cimarosti: "RE: Special characters"

    Marco Cimarosti scripsit:

    > { As a side note, the idea that a language my use "foreign" words seems
    > terribly naive to me. It is true that, in Italian, we use loanwords such as
    > "hardware", "punk", or "footing", but it would be silly to consider or tag
    > them as "English words". They are genuinely Italian words, [...]

    In English, however, the distinction between borrowings and truly
    foreign words does make sense. Such a word as Weltanschauung, for example,
    is written in its native orthography complete with capital letter, is
    almost invariably typeset in italics, and is most often (by the educated;
    the uneducated will not know or use it at all) given some approximation
    of its original pronunciation.

    Even in Italian, what about Latin terms embedded in classic poetry?
    Are you going to say that those too are Italian, just with a slightly
    peculiar morphology?

    Hindi-Urdu is another good example. There is a core of common words
    with a common phonology. Then there is a long list of Sanskrit-based
    terms, mostly used in the Hindi varieties of the language, which use
    a reduced form of Skt phonology. Similarly, there is another long list
    of Persian- and Arabic-based terms, mostly used in the Urdu varieties
    of the language (there are lots of Persian and Arabic borrowings in the
    core, however), which use a reduced form of Persian or Arabic phonology.

    > As I see it, the problem is not merely that the two fashions of tags may
    > specifying different languages. That would not be a real conflict. It is
    > perfectly legitimate to embed language tags into each other: the rule is
    > that the inner language tag wins. This general rule can be extended to
    > accommodate plain text tags, they will always take the precedence as they
    > clearly are the innermost specification.

    Plain-text tags don't nest, however: you need to give a tag explicitly
    naming the outer language when you return to it.

    > If they are rendered as invisible glyphs, they make the text more difficult
    > to edit and to move the cursor within, because the user will have no way of
    > understanding why the cursor stops twice in apparently random positions.
    > This also exposes the information contained in language tags to be
    > unwillingly corrupted by subsequent editing.

    This argument proves too much: it applies with equal force to the
    invisible bidi controls and the other Unicode controls. In practice
    these things are not available for plaintext-style editing except in a
    "reveal controls" mode, which could equally well reveal the tags using
    some stylized glyphs.

    One art / There is                      John Cowan <>
    No less / No more             
    All things / To do            
    With sparks / Galore                     -- Douglas Hofstadter

    This archive was generated by hypermail 2.1.5 : Tue Nov 05 2002 - 08:22:36 EST