Re: U+0140

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 19 2004 - 16:03:07 EDT

Next message: Kenneth Whistler: "Re: Web Form: Subj: Unicode conversion- Microsoft Visual C++ compiler"

Previous message: Raymond Mercier: "Re: Web Form: Subj: Unicode conversion- Microsoft Visual C++ compiler"
Maybe in reply to: Patrick Andries: "Re: U+0140"
Next in thread: Peter Kirk: "Re: U+0140"
Reply: Peter Kirk: "Re: U+0140"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Hudson responded to Michael Everson:

> Michael Everson wrote:
>
> >> This would make the mid-dot too high. The top dot of the colon usually
> >> sits toward the top of the x-height; the *mid*-dot should sit lower,

> > John, I just don't believe you. I don't believe that in all the history
> > of Greek and Catalan typography this careful hairsplitting has *always*
> > taken place; certainly in scientific transcription the HALF TRIANGULAR
> > COLON is just the top dot in the TRIANGULAR COLON, and in Americanist
> > transcription where the dot-colons are used instead of triangles I would
> > say the same applies.
>
> I never contested that the dots of a colon correspond to the triangles of the linguistic
> long vowel marker. They clearly do. What I contested was that the typographic mid-point
> (U+00B7) corresponded to the top dot of a colon. It clearly does not. It is called a
> mid-point because it sits midway up the x-height. It is used in this position for a
> variety of stylistic purposes, ...

I think we have two typographers here arguing somewhat at cross-purposes.
Clearly the typographic "mid-point" behaves as John has mentioned, and is
designed as such in many fine fonts (examples seen among the exhibits that
Asmus gathered).

But just a clearly, there is a long, long tradition in Americanist
orthographic practice (which is used widely for linguistic orthographies
outside of Native America as well) of using a "raised dot" for an indication
of vocalic (and occasionally consonantal) length. For 100 years, that
raised dot was mechanically generated by, among other means, filing the
lower dot off a colon key on a mechanical typewriter. (I have such a
typewriter sitting on my desk.) Linguists got used to this raised dot
height, coordinated with a colon in design (which then could be used, among
other things to indicate a prolonged length, when two degrees of length
were in question), and that preference made its way into print, at least
for many North American languages, where the raised dot could be printed
at x-height, rather than at midway up the x-height, which would be too
low for most of the linguistic usage.

Enter the electronic age. ASCII had no MIDDLE DOT. It was period (.), colon (:)
or the highway. Early linguistic material on computers made do with those,
because they had no choice. The IBM PC and the Macintosh introduced a
MIDDLE DOT (0xFA [= IBM CDRA SD630000 "Middle Dot"] and 0xE1, respectively).
When ISO 8859-1 was defined, it also had a MIDDLE DOT (0xB7). *Everybody*
made use of that MIDDLE DOT for anything that was vaguely in the ballpark --
the typographical mid-point, the linguistic length mark, the mathematical
multiplication operator, the Greek ano teleia, the dictionary hyphenation
point, and, yes, the Catalan middle dot. The fact that each of those usages
might have extremely fine typographical hairs to split regarding the rendering
was so much horsepucky as far as the character identity was concerned. You
used what you had available to represent your data.

The Unicode Standard, for a variety of reasons -- some of which included
compatibility mapping concerns to other standards which had started to
proliferate middle dots -- added a collection of middle dots *besides*
U+00B7, *the* middle dot, to its repertoire. Those other middle dots give
people textual representation alternatives now, if they need to make
distinctions, and textual rendering alternatives, if they need to make
middle dots which display with slightly different heights, sizes, or
spacings, depending on the rendering requirements.

What is clear, however, is that it is utterly impossible to satisfy
everybody regarding middle dots. Typographical purists will always want
plain text to make more distinctions. Text processing requirements will
abhor the splitting of text representation into more and more difficult-to-
distinguish glyph representations without clear semantic differences.
And dot proliferation *always* poses difficulty for establishing
character properties.

Before people bluster on too much further on this thread, it would
be good for everyone to recall that the *reason* why U+00B7 has
problematical properties is that it was inherently ambiguous in
*preexisting* usage (that is, prior to Unicode altogether) as punctuation
versus length mark (and other things as well). This puts it in the
same grabbag of very difficult, ambiguous ASCII characters, such as
"~", "*", and "'" which also acquired conflicting usages during their
reign among the small set of available punctuation and symbols in
ASCII.

History has consequences. The history of a character's encoding also
has consequences for how the Unicode Standard is to be used and
interpreted.

--Ken

Next message: Kenneth Whistler: "Re: Web Form: Subj: Unicode conversion- Microsoft Visual C++ compiler"
Previous message: Raymond Mercier: "Re: Web Form: Subj: Unicode conversion- Microsoft Visual C++ compiler"
Maybe in reply to: Patrick Andries: "Re: U+0140"
Next in thread: Peter Kirk: "Re: U+0140"
Reply: Peter Kirk: "Re: U+0140"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 19 2004 - 16:56:36 EDT