L2/01-038 From: Jonathan Rosenne [rosenne@qsm.co.il] Sent: Thursday, January 18, 2001 4:16 AM Subject: Add U+FB1D to the Composition Exclusion List This message is written in support of Martin Duerst's proposal to the UTC. Abstract: 1. Is it a bug or a feature? 2. What is the impact of not fixing it? 3. What is the impact of fixing it? 1. Is it a bug or a feature? Page 805: FB1D is defined just the same way as all other Hebrew pointed letters, e.g. FB3C. Page 188: "These alphabetic presentation forms are included for compatibility purposes. For the preferred encoding, see Hebrew Presentation Forms, U+FB1D - U+FB4F, in the names list." FB1D is specified to behave like all the others, and is designated not to be the preferred encoding. As evidenced by this and by UAX 15, the UTC had accepted that for Hebrew the normalized form is decomposed. The omission of FB1D from the Exclusion List is the only exception. 2. What is the impact of not fixing it? All Hebrew presentation forms in FBxx are the same except for FB1D. This inconsistency will cause endless problems, many software developers will fail to notice it, and definitely users will not understand. It will give Unicode a bad name. The combination of the letter Yod with the point Hiriq does appear in actual Hebrew texts, with average frequency. Yod may have, with the Hiriq, additional combining marks such as Dagesh, Meteg and cantillation marks. The points in Hebrew are optional, you may have in the same text the same word sometimes with the Hiriq and some times without it. The only reasonable way to process the FB1D in any meaningful way is to decompose it first. This is true for all Hebrew letters and vowels, but since they are normally decomposed it is a problem only with FB1D. Most Hebrew applications, and they are many, do not handle composed characters because they do not expect them. FB1D: - is not part of the Hebrew subsets of 10646 - is not required to support Hebrew - is not available in Hebrew fonts - is not supported or even recognized by most Hebrew software - is not included in any Israeli national standard A Hebrew text with vowels will contain several occurrences of Hiriq, some following Yod and others following other letters. For us, there is no difference, the Hiriq should be treated the same way. But if FB1D were not to be excluded, then under form C or KC the sequence Yod Hiriq would be changed everywhere to FB1D, which is not recognized and will display as a blank square or a question mark. Since it is recommended that Unicode texts should be pre-normalized at the source, the user would have no control over it. Hebrew text which passed through a conforming normalization would become unusable. As it stands, the Unicode standard contradicts itself, in that the CompositionExclusions contradict the text quoted above (pages 188 and 805). 3. What is the impact of fixing it? We believe the impact of fixing this now is minimal. As far as we know, FB1D has not yet been implemented and no font supports it. At the moment, no standard requires Unicode normalization and consequently there are no conforming applications requiring modification. Whenever a new standard will require normalization, applications would be verified to conform and after that change would be difficult. The real problem is that the Unicode consortium will be breaking its promise to the world at large that the normalizations are now fixed and stable. But it isn't as simple as that, because at present the Unicode standard text regarding FB1D is in contradiction to the CompositionExclusions which are also part of the standard. This is, essentially, a correction to make the Unicode standard consistent. We suggest it is the right thing to do and will be accepted as such. Jony