From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue Jan 17 2006 - 17:28:07 CST
On Tue, 17 Jan 2006, Mark E. Shoulson wrote:
> Is the HORIZONTAL ELLIPSIS character, U+2026, to be preferred over using
> three periods (possibly spaced with non-breaking spaces)? Or is it only
> there for backward compatibility?
We recently had a discussion about compatibility characters in general,
and it still seems to me that the position in the Unicode standards is
that compatibility characters be generally avoided, except for legacy
data and other special occasions. We may disagree on what constitutes a
special occasion, but it can hardly mean "whenever you like". And surely
the general principle can be explicitly overruled in the standard for some
characters. As far as I can see, there is no such rule for the HORIZONTAL
ELLIPSIS.
On the other hand, the special role of the HORIZONTAL ELLIPSIS is
difficult to replace by other means. It seems unnecessarily clumsy to add
specific character spacing in word processing whenever you use "...",
and it is also clumsy to do similar things using markup and a style sheet.
(Using <span class="ellipsis">...</span> instead of … or the
HORIZONTAL ELLIPSIS as such doesn't sound very natural.)
Using no-break spaces or normal spaces would not be a good idea, since
it would mostly create far too much spacing. And attempts to reduce the
spacing would be pointless, since you could just as well use "..." and
increased spacing.
> Obviously, "what is preferable" depends on what the situation is, but even so
> there can be a sensible answer, e.g. "f + i" (or "f + ZWJ + i") is preferable
> to U+FB01, LATIN SMALL LIGATURE FI, right?
I think most of us agree with that, except for special situations, for
varying meanings of "special situations". On the practical side, if you
wish to use the fi ligature for typographic reasons, using U+FB01 is often
the only feasible way.
When considering the use of the fi ligature, for example, we should
estimate the meaning of various pros and cons. For example, if I were
preparing a document to be printed and distributed and wanted f and i to
be ligated, I could rather safely use U+FB01 after checking that the
software used will handle it. But if the document would distributed in
digital format, problems would arise. For example, if the document
contained the string http://www.suomi.fi written using U+FB01 and someone
copied and pasted it into a web browser, the address would not work, of
course (unless the copy & paste implementation did something relatively
elaborated like normalization, which would be somewhat unexpected).
> Is HORIZONTAL ELLIPSIS of this class?
It's in the class of compatibility decomposable characters, but that class
contains many different types of characters. If you ask me, HORIZONTAL
ELLIPSIS should have been defined as an independent non-compatibility
character. It's too late to change that, but it might be given a different
status without changing the formal definitions.
Interestingly, the Chicago Manual of Style describes the use of ellipsis
points (without identifying characters by Unicode numbers) so that English
and many other languages use "spaced" points whereas some languages use
"unspaced" points. In Unicode terms, this seems to be a difference between
HORIZONTAL ELLIPSIS and a sequence of three copies of FULL STOP, though of
course it could also be described as typographical only, to be handled
above the character level.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Tue Jan 17 2006 - 17:34:20 CST