From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 05 2004 - 16:42:27 EST
Peter Kirk wrote in response to Philippe Verdy:
> But you do seem to have found a real problem with the standard. If the
> character name is not guaranteed to be an accurate means of
> identification of the character, and the glyph is not normative, how can
> I know from the standard that U+01A3 is intended to be this pan-Turkic
> gha, i.e. that that is its fundamental character identity, and that it
> is not in fact a character in some other even more obscure variant Latin
> alphabet which is actually named or pronounced "oi"? Of course the notes
> do help, as does the glyph, but these are not normative.
You know by making use of the standard, where the informative
notes (= gha, * Pan-Turkic Latin alphabets) were added precisely
to enable the proper identification.
You also know, in the case of confusing edge cases, by coming
to this discussion list and browsing through the archives for
the "story" of U+01A2/U+10A3, which is abundantly documented
there, or by asking the various experts who are familiar with
the intent of the standard.
This is all *much*, *much* easier than some of the problems
posed by Han characters, where the identity of many of the
obscure, historic characters is a matter of extensive research
into the cross-references in Unihan.txt to pin them down to
sources and differentiate them from the numerous kinds of
variants that occur in the vast sea of Han characters.
For the nit-pickers, here is my assessment of the status of
name and glyph in the standard.
The character name is normative and immutable. That doesn't
mean that it is always "correct", as we have discovered in
cases such as U+01A3. The character name is also a mandatory
part of the documentation of the standard -- it is present
for every character, either explicitly listed or a rule
given whereby it can be derived (for Hangul and Han).
The character glyph is informative and mutable. What this means
is that the character committees are not attempting to
*standardize* the glyph shapes per se. Unicode is not a
font standard. And different fonts have been used to print
the standard(s) over the years, so there have been minor
emendations to the particulars of glyphs over the years.
However, the glyphs are also a *mandatory* part of the documentation
of the standard. They are present for every character, precisely
to assist, via a representative glyph shape, in the proper
identification of the character encoded at each code point
in the standard.
When the combination of character name and representative
glyph and associated informative annotations is insufficient
to correctly identify a character in the standard, the
recourse is to Ask the Experts and request further annotation
of the standard to assist future users from running into the
same problem.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jan 05 2004 - 17:39:04 EST