From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Jun 16 2006 - 01:28:01 CDT
On Thu, 15 Jun 2006, John Hudson wrote:
> As far as I'm concerned, the encoding of what the standard clearly
> acknowledges as a 'glyph variant of 2018' as a separate character is itself
> contrary to the Unicode character/glyph model.
I'm afraid the statement in the standard (in the code chart) is not quite
clear either. What does it mean to say that a character is a glyph variant
of another character? The natural interpretation, I'd say, is that there
is only a glyph difference but the character was encoded separately for
some compatibility reasons, e.g. because some base standard makes such a
distinction. But then I would expect that the characters are defined as
compatibility equivalent, or perhaps even canonically equivalent. Yet
U+2018 and U+201B are two completely distinct characters. This sounds like
the result of some interesting compromise.
On the practical side, the standard probably makes its point clear in the
description of U+2018 in the code chart, when it says "this is the
preferred glyph (as opposite to U+201B)". The wording is odd (even calling
a character a glyph), but apparently the idea is that although U+201B is
included into the standard and although no formal relationship between it
and U+2019 is defined, U+2019 and U+201B are essentially two glyph forms
of the same character, with an expressed preference to the former. By
"glyph form", I mean the "6" form shape and the reversed "9" form shape.
I cannot comment on the question whether this policy is reasonable, but it
seems to be the current Unicode policy. Of course, it does not prevent
anyone from using U+201B if he finds it correct for orthographical or
typographical reasons.
In general, quotation marks are a problem in encoding characters largely
because there has been considerable variation in the shapes of quotation
marks in printed matter and in handwriting, even within a language.
Until rather recently, it was common to regard e.g. curly quotation marks
and chevrons just as two different styles for quotes. Publications might
even use asymmetric quotes in headings but symmetric quotes in copy text,
etc. So there was a tough decision: which variation is interpreted as a
character difference?
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Fri Jun 16 2006 - 01:29:52 CDT