From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Oct 22 2005 - 08:19:26 CST
From: "Asmus Freytag" <asmusf@ix.netcom.com>
>> Without such notice published with the standard itself, this standard 
>> remains confusive (and given that the representative glyph is not 
>> completely normative and can be changed at any time for another or less 
>> confusive representation of some encoded character, users may favor the 
>> interpretation given by the normative character name, hence generating 
>> encoding errors...).
>
> Glyphs are indeed not normative, because no single glyph ever captures the 
> entire range of visual appearance for the character. Nevertheless there 
> are limits on what changes can be made to a glyph, the most important of 
> them being that the change in glyph must respect the underlying character 
> identity.
I know all that (this last sentence is explained within the standard 
itself).
But this is not a definitive argument. Unicode has already changed the 
appearance of such glyphs so that it effectively changed the underlying 
character identity. The glyph is then just a hint, but does not define the 
character identity itself. Once you ignore it, the remaining character 
identity is its name and its normative properties.
But the character properties between two letters of the same script are 
almost identical (this is the case of the lao letters discussed here). So it 
only remains the normative character name to identify the character. But 
Unicode says that this is just an identifier, without much semantic meaning 
because it is immutable and just an identifier equivalent in meaning as its 
associated numeric code point.
Conclusion: the character identity is very weak. There must exist something 
else to confirm this identity. If the name is wrong, then there must exist a 
strong notice, part of the standard that explicitly says that, and explains 
the expected semantic.
In fact, the semantic of the character is only confirmed by its effective 
most common usage. This is a pragmatic point of view, based on consensus on 
how it should be interpreted, and Unicode then just endorses this common 
practice, by enriching the very basic properties defined in ISO/IEC 10646. 
To make this endorsement more normative, any possible confusion MUST be 
explained by Unicode itself, otherwise this is not a standard, and people 
will continue to use characters the way they want, and the Unicode standard 
would not even be needed (ISO/IEC 10646 would be enough)!
If I see only one strong positive argument in favor of Unicode it is exactly 
the set of additional normative properties that it adds to the ISO/IEC 10646 
standard. So if there's an error in the normative character name (this is 
not part of the Unicode standard itself, but part of ISO/IEC 10646) and this 
is immutable, then it is the role of Unicode of clarifying this. If it does 
not do that, then there's no standard for the affected characters, and any 
interpretation revealed by the normative character or by the representative 
glyph is correct. 
This archive was generated by hypermail 2.1.5 : Sat Oct 22 2005 - 08:22:49 CST