U+201B (was: Re: Glyphs for German quotation marks)

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 16 2006 - 17:55:06 CDT

    John Hudson said:

    > The entire existence of U+201B is a bad idea.

    And U+201F, as well, of course.

    > As Jukka wrote, there is
    > probably an interesting story behind its encoding. Perhaps Ken Whistler
    > might be able to tell us what it was.

    I've been trying, but unlike most of the original repertoire of
    Unicode 1.0, I can't pin a source to these two. U+201B and U+201F have
    been present from Unicode 1.0, so they predate the merger with
    ISO 10646. And even in Unicode 1.0, they weren't mapped to any
    existing legacy set, as far as I can tell, nor do they derive from
    XCCS or the IBM CDRA glyph list. It is possible that they were picked
    up from the old AFII glyph registry, although I don't have access to
    that, so somebody else will have to check.

    And I don't have access to the pre-1.0 draft material here in my
    office, although I might be able to dig further in my home archives
    to find out when they entered the pre-1.0 drafts.

    > Remember, just because something
    > is in Unicode doesn't mean that the Unicode Technical Committee *wanted*
    > it to be in Unicode or thinks it is a good idea.

    Unfortunately, in the case of material dating back to Unicode 1.0,
    the buck stops with the UTC. The repertoire in Unicode 1.0 didn't have the
    benefit of foresight regarding all the potential implementation
    issues that *could* conceivably arise. The character/glyph model was
    reasonably well understood at the time as regards the encoding
    approach to *letters*, but the encoding was hazier when it came to
    marks and symbols. The encoding of marks was deliberately shape-based,
    because there were just too many funny edge cases and problems with
    marks, otherwise. I suspect that the collection of character candidates
    for punctuation was heavily influenced by the decisions that had to
    be taken for the marks -- namely, that it would be better to encode
    distinctions based on shape, rather than assume that smart rendering
    would be able to pick from an assortment of related shapes in context,
    particularly since there really had been no history of implementations
    that would do that. (Remember, we are talking about 1989 here.)

    For years, in fact, one of the main knocks on the Unicode encoding
    of punctuation was that it *under*differentiated on the basis of
    shape: the unification of the baseline Latin ellipsis with the
    centerline East Asian ellipsis caused no end of grief in implementations,
    for example.

    The encoding of U+201B (and the parallel decision for U+201F) might
    also have been influenced by the inverse situation for
    U+02BB and U+02BD, where the most usual form seen is the
    reversed-9 glyph ("reversed apostroph"), and where the 6 glyph
    seems to be a little less used.

    Also remember that these decisions were taken for Unicode 1.0
    as long as 13 years before the first VARIATION SELECTOR was added
    to the standard, and a good 10 years before the concept of
    variation selection started getting significant debate in the UTC.
    If the question were revisited now, I have little doubt that
    any serious attempt to encode 201B and 201F as glyphic variants
    would have been countered with a proposal to simply make use of
    the variation selection mechanism to specify the particular
    glyph variants in question, rather than separately encoding them as


