L2/01-008 From: Peter_Constable@sil.org Sent: Friday, December 22, 2000 9:32 AM Subject: Argument to add U+FB1D to the Composition Exclusion list On 12/21/2000 07:00:18 PM "Martin J. Duerst" wrote: >I really don't get how this would work in this case. >Is the proposal to introduce e.g. >FB1C NEW HEBREW LETTER YOD WITH HIRIQ ? > >If that is introduced, how would this affect the >fact that: > >- Ideally, YOD and HIRIQ should be decomposed > >- If YOD is followed by HIRIQ, it will have to be > normalised to U+FB1D This is precisely what came to my mind. In the case of FB1E, I suppose one could deprecate this and replace it with another character that had a compatibility decomposition to 05bf, and that would create problems for stability of the normalization forms. It would mean, though, that people may have to revise mapping tables, and there's the possiblity that existing data now contains a deprecated character. But the other option - adding a compatibility decomposition for fb1e to 05bf - makes better sense. This doesn't affect the *composition* part of NFC and NFKC since those are fixed to ver. 3.0.0, and UTR 15 does not lock NFKD to ver. 3.0.0. But, in the case of fb1d, matters are different. As Martin observes, we cannot deprecate this character because, as things currently stand, users are obliged to use it in NFC. I'm inclined to argue with Martin that we should treat this as an erratum to ver. 3.0.0 and add this to the composition exclusion table. Evidence and arguments: Findings of fact: If you look in the compatibility and specials area U+F900 - U+FFFF, the vast majority of characters have singleton canonical decompositions, or have compatibility decompositions. The complete list of exceptions are listed at the end of this message. They fall into two types: A) characters that have *no* decomposition. These are characters that are not considered compatibility characters (in spite of some names) or are discouraged from use in any way. B) characters that have non-singleton canonical decompositions. These are potential candidates for composition in NFC, NFKC unless explicitly excluded in the composition exclusions file. Of all the characters in set A, there is precisely one that should not have been in that set: FB1E (because it should have had a compatibility decomposition). Of all the characters in set B, there is precisely one character that does not appear in the compatibility exclusions list: FB1D. Argumentation: FB1E: Adding a compatibility decomposition for FB1E in a future version has no detrimental impact on any normalization forms or existing mapping tables. Two different sources have testified that this is a glyph variant of 05bf HEBREW POINT RAFE. I therefore propose that a compatibility decomposition for this character be added in a future version. FB1D: Adding FB1D to the compatibility exclusion list at this point has potential to create problems since it would go against the guarantees laid down that composition for NFC, NFKD would be fixed at ver. 3.0.0. A change would impact existing implementations, but more importantly there is concern that a change would suggest that guarantees from UTC are not trustworthy. In spite of these concerns, we have a case of obvious oversight. Furthermore, the findings of fact demonstrate that there is no other possible case of such oversight. Thus, such a change is not something that could ever be repeated. The impact on existing implementations is clearly a one-time matter and clearly the result of an oversight. Apart from the issue of impact on existing implementations, the change makes sense: FB1D was intended to be considered a compatibility character the use of which was to be avoided where possible, and the change makes that possible. Not making the change, however, requires that this character be used forever. It would be a minor one-time change to fix an obvious oversight, and it is a change that makes for a more sensible state of affairs. As far as existing data is concerned, the only data in question is data in NFC or NFKC that contains FB1D and that has been created since ver. 3.0.0 was published. The amount of data that meets these criteria is quite possibly nil; there certainly is not very much of it. I therefore propose that FB1D be added to the compatibility exclusion table *as an erratum to ver. 3.0.0". - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: --------------------------------------------------------------------------- Appendix: Complete list of characters in the range U+F900 - U+FFFF that are neither singleton canonical decompositions nor compatibility decompositions (i.e. are potential candidates for use in NFC, NFKC). FA0E CJK COMPATIBILITY IDEOGRAPH-FA0E FA0F CJK COMPATIBILITY IDEOGRAPH-FA0F FA11 CJK COMPATIBILITY IDEOGRAPH-FA11 FA13 CJK COMPATIBILITY IDEOGRAPH-FA13 FA14 CJK COMPATIBILITY IDEOGRAPH-FA14 FA1F CJK COMPATIBILITY IDEOGRAPH-FA1F FA21 CJK COMPATIBILITY IDEOGRAPH-FA21 FA23 CJK COMPATIBILITY IDEOGRAPH-FA23 FA24 CJK COMPATIBILITY IDEOGRAPH-FA24 FA27 CJK COMPATIBILITY IDEOGRAPH-FA27 FA28 CJK COMPATIBILITY IDEOGRAPH-FA28 FA29 CJK COMPATIBILITY IDEOGRAPH-FA29 FB1E HEBREW POINT JUDEO-SPANISH VARIKA FD3E ORNATE LEFT PARENTHESIS FD3F ORNATE RIGHT PARENTHESIS FE20 COMBINING LIGATURE LEFT HALF FE21 COMBINING LIGATURE RIGHT HALF FE22 COMBINING DOUBLE TILDE LEFT HALF FE23 COMBINING DOUBLE TILDE RIGHT HALF FEFF ZERO WIDTH NO-BREAK SPACE FFF9 INTERLINEAR ANNOTATION ANCHOR FFFA INTERLINEAR ANNOTATION SEPARATOR FFFB INTERLINEAR ANNOTATION TERMINATOR FFFC OBJECT REPLACEMENT CHARACTER FFFD REPLACEMENT CHARACTER FB1D HEBREW LETTER YOD WITH HIRIQ 05D9 05B4 FB1F HEBREW LIGATURE YIDDISH YOD YOD PATAH 05F2 05B7 FB2A HEBREW LETTER SHIN WITH SHIN DOT 05E9 05C1 FB2B HEBREW LETTER SHIN WITH SIN DOT 05E9 05C2 FB2C HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT FB49 05C1 FB2D HEBREW LETTER SHIN WITH DAGESH AND SIN DOT FB49 05C2 FB2E HEBREW LETTER ALEF WITH PATAH 05D0 05B7 FB2F HEBREW LETTER ALEF WITH QAMATS 05D0 05B8 FB30 HEBREW LETTER ALEF WITH MAPIQ 05D0 05BC FB31 HEBREW LETTER BET WITH DAGESH 05D1 05BC FB32 HEBREW LETTER GIMEL WITH DAGESH 05D2 05BC FB33 HEBREW LETTER DALET WITH DAGESH 05D3 05BC FB34 HEBREW LETTER HE WITH MAPIQ 05D4 05BC FB35 HEBREW LETTER VAV WITH DAGESH 05D5 05BC FB36 HEBREW LETTER ZAYIN WITH DAGESH 05D6 05BC FB38 HEBREW LETTER TET WITH DAGESH 05D8 05BC FB39 HEBREW LETTER YOD WITH DAGESH 05D9 05BC FB3A HEBREW LETTER FINAL KAF WITH DAGESH 05DA 05BC FB3B HEBREW LETTER KAF WITH DAGESH 05DB 05BC FB3C HEBREW LETTER LAMED WITH DAGESH 05DC 05BC FB3E HEBREW LETTER MEM WITH DAGESH 05DE 05BC FB40 HEBREW LETTER NUN WITH DAGESH 05E0 05BC FB41 HEBREW LETTER SAMEKH WITH DAGESH 05E1 05BC FB43 HEBREW LETTER FINAL PE WITH DAGESH 05E3 05BC FB44 HEBREW LETTER PE WITH DAGESH 05E4 05BC FB46 HEBREW LETTER TSADI WITH DAGESH 05E6 05BC FB47 HEBREW LETTER QOF WITH DAGESH 05E7 05BC FB48 HEBREW LETTER RESH WITH DAGESH 05E8 05BC FB49 HEBREW LETTER SHIN WITH DAGESH 05E9 05BC FB4A HEBREW LETTER TAV WITH DAGESH 05EA 05BC FB4B HEBREW LETTER VAV WITH HOLAM 05D5 05B9 FB4C HEBREW LETTER BET WITH RAFE 05D1 05BF FB4D HEBREW LETTER KAF WITH RAFE 05DB 05BF FB4E HEBREW LETTER PE WITH RAFE 05E4 05BF End of appendix. ---------------------------------------------------------------------------