From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 16 2006 - 17:55:06 CDT
John Hudson said:
> The entire existence of U+201B is a bad idea.
And U+201F, as well, of course.
> As Jukka wrote, there is
> probably an interesting story behind its encoding. Perhaps Ken Whistler
> might be able to tell us what it was.
I've been trying, but unlike most of the original repertoire of
Unicode 1.0, I can't pin a source to these two. U+201B and U+201F have
been present from Unicode 1.0, so they predate the merger with
ISO 10646. And even in Unicode 1.0, they weren't mapped to any
existing legacy set, as far as I can tell, nor do they derive from
XCCS or the IBM CDRA glyph list. It is possible that they were picked
up from the old AFII glyph registry, although I don't have access to
that, so somebody else will have to check.
And I don't have access to the pre-1.0 draft material here in my
office, although I might be able to dig further in my home archives
to find out when they entered the pre-1.0 drafts.
> Remember, just because something
> is in Unicode doesn't mean that the Unicode Technical Committee *wanted*
> it to be in Unicode or thinks it is a good idea.
Unfortunately, in the case of material dating back to Unicode 1.0,
the buck stops with the UTC. The repertoire in Unicode 1.0 didn't have the
benefit of foresight regarding all the potential implementation
issues that *could* conceivably arise. The character/glyph model was
reasonably well understood at the time as regards the encoding
approach to *letters*, but the encoding was hazier when it came to
marks and symbols. The encoding of marks was deliberately shape-based,
because there were just too many funny edge cases and problems with
marks, otherwise. I suspect that the collection of character candidates
for punctuation was heavily influenced by the decisions that had to
be taken for the marks -- namely, that it would be better to encode
distinctions based on shape, rather than assume that smart rendering
would be able to pick from an assortment of related shapes in context,
particularly since there really had been no history of implementations
that would do that. (Remember, we are talking about 1989 here.)
For years, in fact, one of the main knocks on the Unicode encoding
of punctuation was that it *under*differentiated on the basis of
shape: the unification of the baseline Latin ellipsis with the
centerline East Asian ellipsis caused no end of grief in implementations,
for example.
The encoding of U+201B (and the parallel decision for U+201F) might
also have been influenced by the inverse situation for
U+02BB and U+02BD, where the most usual form seen is the
reversed-9 glyph ("reversed apostroph"), and where the 6 glyph
seems to be a little less used.
Also remember that these decisions were taken for Unicode 1.0
as long as 13 years before the first VARIATION SELECTOR was added
to the standard, and a good 10 years before the concept of
variation selection started getting significant debate in the UTC.
If the question were revisited now, I have little doubt that
any serious attempt to encode 201B and 201F as glyphic variants
would have been countered with a proposal to simply make use of
the variation selection mechanism to specify the particular
glyph variants in question, rather than separately encoding them as
characters.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Jun 16 2006 - 17:59:12 CDT