L2/01-038

From: Jonathan Rosenne [rosenne@qsm.co.il]
Sent: Thursday, January 18, 2001 4:16 AM
Subject: Add U+FB1D to the Composition Exclusion List

This message is written in support of Martin Duerst's proposal to the UTC.

Abstract:

1. Is it a bug or a feature?

2. What is the impact of not fixing it?

3. What is the impact of fixing it?


1. Is it a bug or a feature?

Page 805: FB1D is defined just the same way as all other Hebrew pointed letters,
e.g. FB3C.

Page 188: "These alphabetic presentation forms are included for compatibility
purposes. For the preferred encoding, see Hebrew Presentation Forms, U+FB1D -
U+FB4F, in the names list." FB1D is specified to behave like all the others, and
is designated not to be the preferred encoding.

As evidenced by this and by UAX 15, the UTC had accepted that for Hebrew the
normalized form is decomposed. The omission of FB1D from the Exclusion List is
the only exception.

2. What is the impact of not fixing it?

All Hebrew presentation forms in FBxx are the same except for FB1D. This
inconsistency will cause endless problems, many software developers will fail to
notice it, and definitely users will not understand.

It will give Unicode a bad name.

The combination of the letter Yod with the point Hiriq does appear in actual
Hebrew texts, with average frequency.

Yod may have, with the Hiriq, additional combining marks such as Dagesh, Meteg
and cantillation marks. The points in Hebrew are optional, you may have in the
same text the same word sometimes with the Hiriq and some times without it. The
only reasonable way to process the FB1D in any meaningful way is to decompose it
first. This is true for all Hebrew letters and vowels, but since they are
normally decomposed it is a problem only with FB1D.

Most Hebrew applications, and they are many, do not handle composed characters
because they do not expect them.


FB1D:
- is not part of the Hebrew subsets of 10646
- is not required to support Hebrew
- is not available in Hebrew fonts
- is not supported or even recognized by most Hebrew software
- is not included in any Israeli national standard

A Hebrew text with vowels will contain several occurrences of Hiriq, some
following Yod and others following other letters. For us, there is no
difference, the Hiriq should be treated the same way. But if FB1D were not to be
excluded, then under form C or KC the sequence Yod Hiriq would be changed
everywhere to FB1D, which is not recognized and will display as a blank square
or a question mark.

Since it is recommended that Unicode texts should be pre-normalized at the
source, the user would have no control over it. Hebrew text which passed through
a conforming normalization would become unusable.

As it stands, the Unicode standard contradicts itself, in that the
CompositionExclusions contradict the text quoted above (pages 188 and 805).


3. What is the impact of fixing it?

We believe the impact of fixing this now is minimal.

As far as we know, FB1D has not yet been implemented and no font supports it.

At the moment, no standard requires Unicode normalization and consequently there
are no conforming applications requiring modification. Whenever a new standard
will require normalization, applications would be verified to conform and after
that change would be difficult.

The real problem is that the Unicode consortium will be breaking its promise to
the world at large that the normalizations are now fixed and stable. But it
isn't as simple as that, because at present the Unicode standard text regarding
FB1D is in contradiction to the CompositionExclusions which are also part of the
standard.

This is, essentially, a correction to make the Unicode standard consistent. We
suggest it is the right thing to do and will be accepted as such.

Jony