From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Nov 21 2005 - 10:57:55 CST
For stability of normalization, there is an absolute ban on normalizing
any existing sequence to a new precomposed character (NFC).
http://www.unicode.org/standard/stability_policy.html
Thus since any new precomposed characters are normalized away there is
little point to introducing them, and the committee has a policy to not
encode them.
Note: there is one possible exception. If a precomposed character and at
least one character of its decomposition were both encoded in a new
version of Unicode, it would be possible to normalize to the precomposed
character in that new version. That would be a case like:
X ~ Y + Z
where X and either Y or Z are new. I don't think we've ever done that,
since introducing NFC. It's unlikely that that situation would come up
with an existing script, but might possibly come up with a new script.
I don't recall exactly why the yiddish characters are treated in that
fashion; it was some years ago. Perhaps Ken or someone else recalls.
See also the http://www.unicode.org/faq/ on characters and combining
marks, and on normalization.
Mark
Cary Karp wrote:
> Quoting Mark E. Shoulson:
>
>> I'd venture to say that double-vav, vav-yod, and yod-yod ligatures
>> should have *canonical* decomposition to their constituent letters!
>> I'm sure that would cause problems of some sort, but at least
>> compatibility decomposition is necessary.
>>
>> Doesn't really matter which is the more frequently entered; we
>> normalize strings all the time in Unicode.
>
>
> Why are they not being normalized here?
>
> I assume that at least part of the answer lies in the fourth Yiddish
> digraph 'pasekh tsvey yudn', HEBREW LIGATURE YIDDISH DOUBLE YOD WITH
> HEBREW POINT PATAH (U+05F2 U+05B7). Which (I further assume) would
> decompose and recompose correctly only if the YIDDISH DOUBLE YOD
> ligature were the canonical form. What I don't understand, is why the
> entire pointed digraph wasn't represented as a single precombined
> character, with it then being possible to decompose the other three
> ligatures as Mark suggests.
>
> With apologies for not having been able to locate the answers to the
> following questions and thus needing to pose them on this list:
>
> Is there a categorical ban on the assignment of code points to new
> characters that can be represented by combining preexisting characters
> and, if so, where will I find a citable reference to it?
>
> /Cary
>
>
>
>
This archive was generated by hypermail 2.1.5 : Mon Nov 21 2005 - 11:05:07 CST