From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 19 2005 - 22:51:48 CST
> Denis Jacquerye wrote:
>
>> When I was questioning if U+0254, 025C, 0186 and U+0190 could be
>> precomposed with acute, grave, circumflex or caron a few month ago on
>> Unicode-Afrique. People notorious on this list replied it would simply
>> be impossible, because of the proposal guidelines.
>
>>>From the proposal guidelines :
>> Often a proposed character can be expressed as a sequence of one or
>> more existing Unicode characters. Encoding the proposed character
>> would be a duplicate representation, and is thus not suitable for
>> encoding.
In fact this would not be theorically impossible to encode them, but they 
could only be encoded as compatibility characters, and excluded from 
composition in normalized forms, due to the normalization stability rule. So 
this would really limit the usage of these characters, as all conforming 
processes would that would use those characters would also need to support 
their canonical decomposed equivalents.
So their encoding is not necessary, not even to convince font designers to 
support them (there are now better alternatives to convince font designers 
to support these sequences without requiring these characters to be encoded 
separately: it's to list them as supported named sequences in the Unicode 
database). I see only one reason that would push Unicode (and in fact 
ISO/IEC 10646 first) to encode them (and so add compatibility characters), 
it would be that a national character encoding standard is created that 
requires handling those characters as unbreakable units with a single code 
position in this charset.
For such national applications however, denormalization of canonical 
decomposed sequences would be needed to transcode correctly Unicode to this 
national standard, and because this would be only an intermediate state 
before generating the national code positions, this could be achieved by 
mapping internally those characters as PUAs that have an internal canonical 
decomposition not excluded from recomposition.
But this would require a tailored normalization algorithm (to be uses only 
as an intermediate step from the transcoding from Unicode to the national 
charset) whose result could be inconsistant (for stability) with the 
standard normalization forms in some cases (this could be legitimately a 
problem only if the inconsistent sequences are all represented with an 
equivalent in the national standard, so I think that the national standard 
would avoid this inconsistance by mapping the possibly inconsistant national 
precomposed characters to distinct decomposed Unicode sequences, possibly 
involving the use of Unicode joiner controls, so that the mapping from the 
national charset to Unicode would remain inversible even after Unicode 
normalization).
This archive was generated by hypermail 2.1.5 : Wed Oct 19 2005 - 22:54:08 CST