From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 19 2005 - 22:51:48 CST
> Denis Jacquerye wrote:
>
>> When I was questioning if U+0254, 025C, 0186 and U+0190 could be
>> precomposed with acute, grave, circumflex or caron a few month ago on
>> Unicode-Afrique. People notorious on this list replied it would simply
>> be impossible, because of the proposal guidelines.
>
>>>From the proposal guidelines :
>> Often a proposed character can be expressed as a sequence of one or
>> more existing Unicode characters. Encoding the proposed character
>> would be a duplicate representation, and is thus not suitable for
>> encoding.
In fact this would not be theorically impossible to encode them, but they
could only be encoded as compatibility characters, and excluded from
composition in normalized forms, due to the normalization stability rule. So
this would really limit the usage of these characters, as all conforming
processes would that would use those characters would also need to support
their canonical decomposed equivalents.
So their encoding is not necessary, not even to convince font designers to
support them (there are now better alternatives to convince font designers
to support these sequences without requiring these characters to be encoded
separately: it's to list them as supported named sequences in the Unicode
database). I see only one reason that would push Unicode (and in fact
ISO/IEC 10646 first) to encode them (and so add compatibility characters),
it would be that a national character encoding standard is created that
requires handling those characters as unbreakable units with a single code
position in this charset.
For such national applications however, denormalization of canonical
decomposed sequences would be needed to transcode correctly Unicode to this
national standard, and because this would be only an intermediate state
before generating the national code positions, this could be achieved by
mapping internally those characters as PUAs that have an internal canonical
decomposition not excluded from recomposition.
But this would require a tailored normalization algorithm (to be uses only
as an intermediate step from the transcoding from Unicode to the national
charset) whose result could be inconsistant (for stability) with the
standard normalization forms in some cases (this could be legitimately a
problem only if the inconsistent sequences are all represented with an
equivalent in the national standard, so I think that the national standard
would avoid this inconsistance by mapping the possibly inconsistant national
precomposed characters to distinct decomposed Unicode sequences, possibly
involving the use of Unicode joiner controls, so that the mapping from the
national charset to Unicode would remain inversible even after Unicode
normalization).
This archive was generated by hypermail 2.1.5 : Wed Oct 19 2005 - 22:54:08 CST