Re: Ellipsis

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Jan 23 2006 - 02:36:23 CST

  • Next message: Grosso, Gary: "RE: Ellipsis"

    On Sun, 22 Jan 2006, Asmus Freytag wrote:

    > Therefore, constructing the ellipsis as a compatibility character in that
    > strict sense is fraught with problems. I think what we have here is a
    > compatibility decomposable character which, on closer inspection, turns out
    > not to be a compatibility character.

    The problem with this way of thinking is that the Unicode standard defines
    the term "compatibility character" so that all compatability decomposable
    characters are included. Apparently, this definition needs some tuning.
    Since the term "compatibility character" is not rigorously defined
    and is not used in a crucial way in other definitions, it can be tuned and
    even changed, I suppose. I could even be dispensed with, in favor of
    other, less problematic terms

    > Overall, there exist in the
    > standard vestiges of the original, necessarily somewhat simplistic model that
    > the original designers of the Unicode standard brought to their task. For
    > them, a compatibility character was a classification that existed in perfect
    > black-and-white clarity.

    The present-day problem with this is that as people learn about Unicode
    and start using it, they try to find simplicity and exactness - and they
    pay attention to remnants of the black-and-white clarity. They may even
    become more papal than the pope, if you allow the expression. And as
    information is disseminated from people who have actually read the Unicode
    standard to people who read second or third hand information about it,
    things tend to get simpler at the cost of correctness (i.e., simplistic).
    Therefore, at least the standard should describe the current position
    a bit better.

    > Because of this, the actual nature of a character as
    > compatibility character became at once dependent on the type of text in which
    > it is used, and no longer well-aligned with the formal definition of
    > compatibility decomposable character.

    Perhaps the concept "compatibility character" could some day be declared
    historical. Instead, the Unicode standard, or other standards, could
    declare characters as not recommended for use in particular contexts or
    for particular purposes.

    > [Just so no-one misconstrues my position: the status of many compatibility
    > characters, such as the Arabic positional forms, or vertical forms, are
    > definitely *not* in question.]

    They might be declared as not recommended in general, in some suitable
    formulation that is sufficiently far from declaring them as deprecated and
    sufficiently strong to be relevant.

    > The characters that represent repetitions of the same base element, whether
    > the ellipsis, or the quadruple integral, take a special position. By
    > providing the multigraph character, Unicode allows the author to
    > unambiguously state his or her intention of grouping.

    On similar grounds, a four-dot ellipsis character might be justifiable.

    > PS: I noticed that in this entire discussion, the fact that Unicode must
    > support not only English usage, but other styles and conventions as well
    > seemed to have been forgotton as everyone rushed to take a stance on the
    > arcana of American English (academic) usage.

    I previously mentioned that some languages may use unspaced points.
    In theory, a formatting program might use metainformation about language
    to decide whether a horizontal ellipsis character (or a sequence of three
    full stop characters) should be rendered as spaced or unspaced (and with
    which spacing). But in practice, that would put too much burden on the
    higher levels when a simple distinction can easily be expressed at the
    character level.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Mon Jan 23 2006 - 02:37:43 CST