From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Oct 22 2005 - 08:19:26 CST
From: "Asmus Freytag" <asmusf@ix.netcom.com>
>> Without such notice published with the standard itself, this standard
>> remains confusive (and given that the representative glyph is not
>> completely normative and can be changed at any time for another or less
>> confusive representation of some encoded character, users may favor the
>> interpretation given by the normative character name, hence generating
>> encoding errors...).
>
> Glyphs are indeed not normative, because no single glyph ever captures the
> entire range of visual appearance for the character. Nevertheless there
> are limits on what changes can be made to a glyph, the most important of
> them being that the change in glyph must respect the underlying character
> identity.
I know all that (this last sentence is explained within the standard
itself).
But this is not a definitive argument. Unicode has already changed the
appearance of such glyphs so that it effectively changed the underlying
character identity. The glyph is then just a hint, but does not define the
character identity itself. Once you ignore it, the remaining character
identity is its name and its normative properties.
But the character properties between two letters of the same script are
almost identical (this is the case of the lao letters discussed here). So it
only remains the normative character name to identify the character. But
Unicode says that this is just an identifier, without much semantic meaning
because it is immutable and just an identifier equivalent in meaning as its
associated numeric code point.
Conclusion: the character identity is very weak. There must exist something
else to confirm this identity. If the name is wrong, then there must exist a
strong notice, part of the standard that explicitly says that, and explains
the expected semantic.
In fact, the semantic of the character is only confirmed by its effective
most common usage. This is a pragmatic point of view, based on consensus on
how it should be interpreted, and Unicode then just endorses this common
practice, by enriching the very basic properties defined in ISO/IEC 10646.
To make this endorsement more normative, any possible confusion MUST be
explained by Unicode itself, otherwise this is not a standard, and people
will continue to use characters the way they want, and the Unicode standard
would not even be needed (ISO/IEC 10646 would be enough)!
If I see only one strong positive argument in favor of Unicode it is exactly
the set of additional normative properties that it adds to the ISO/IEC 10646
standard. So if there's an error in the normative character name (this is
not part of the Unicode standard itself, but part of ISO/IEC 10646) and this
is immutable, then it is the role of Unicode of clarifying this. If it does
not do that, then there's no standard for the affected characters, and any
interpretation revealed by the normative character or by the representative
glyph is correct.
This archive was generated by hypermail 2.1.5 : Sat Oct 22 2005 - 08:22:49 CST