From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Jan 22 2006 - 02:53:40 CST
On 1/21/2006 12:40 AM, Jukka K. Korpela wrote:
>
> The Unicode standard says (in 2.3):
> "Compatibility characters are those that would not have been encoded
> except for compatibility and round-trip convertibility with other
> standards."
> We can take this position strictly and therefore regard HORIZONTAL
> ELLIPSIS as included for such reasons only. But that would be somewhat
> inconvenient position, since there is seldom any good way way to
> create spaced ellipsis dots except by using the HORIZONTAL ELLIPSIS
> character.
>
You can take this position, and with lots of intelligence in your layout
system you might parse and recognize sequences of periods for this or
the the other.
However, as you note yourself, it would be inconvenient. But to
understand how inconventient you need to consider the whole range of
applications for this charcters. For exameple, the Japanese ellipsis is
unified with the horizontal ellipsis. When displayed, it is raised
compared to the ordinary one, and in practice, it has proven quite
difficult to recognize when the ellipsis character should be shown in
one style or the other. Therefore I am not at all hopeful that any
position that puts a further burden on a layout systems, viz. to
recognize which in a series of periods are intended to represent an
ellipsis, would be of any more benefit to the user of the standard.
Therefore, constructing the ellipsis as a compatibility character in
that strict sense is fraught with problems. I think what we have here is
a compatibility decomposable character which, on closer inspection,
turns out not to be a compatibility character.
A similar issue exists with the one-dot leader, which is unified in the
standard with an Armenian punctuation character, belying any attempt to
classify it as "only encoded for compatibility". Overall, there exist in
the standard vestiges of the original, necessarily somewhat simplistic
model that the original designers of the Unicode standard brought to
their task. For them, a compatibility character was a classification
that existed in perfect black-and-white clarity.
Over time, Unicode grew to encompass mathematical notation, phonetic
characters, and a number of other things, while at the same time
freezing the definition of compatibility decomposable character (by
fixing the decompositions). Because of this, the actual nature of a
character as compatibility character became at once dependent on the
type of text in which it is used, and no longer well-aligned with the
formal definition of compatibility decomposable character.
[Just so no-one misconstrues my position: the status of many
compatibility characters, such as the Arabic positional forms, or
vertical forms, are definitely *not* in question.]
The characters that represent repetitions of the same base element,
whether the ellipsis, or the quadruple integral, take a special
position. By providing the multigraph character, Unicode allows the
author to unambiguously state his or her intention of grouping. At the
same time, the sequence of elements serves both as a fallback
representation as well as a natural way to input some of them. I think
that's a strength of the standard, but to use it, it's necessary to
recognize that some 'compatibility characters' are in fact widely used
as if they were ordinary characters, and have no longer owe their
existence solely to 'legacy' mappings.
A./
PS: I noticed that in this entire discussion, the fact that Unicode must
support not only English usage, but other styles and conventions as well
seemed to have been forgotton as everyone rushed to take a stance on the
arcana of American English (academic) usage.. Rather than relying on a
single authority to "know" which dot in a four dot sequence is the
period, having an ellipsis character and a full stop as separate
character is able to support both conventions, even though only one of
them might be preferable in English. (We simply do not know all the
alternative conventions).
This archive was generated by hypermail 2.1.5 : Sun Jan 22 2006 - 02:55:45 CST