From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Tue Jul 08 2003 - 18:08:41 EDT
On 08/07/2003 12:56, Philippe Verdy wrote:
>Suppose your character PATAH-HIRIQ is accepted, and is
>defined as being canonically equivalent to PATAH-HIRIQ.
>Then the definition of canonical equivalence with all Unicode
>algorithm would allow any of these algorithm to decompose
>it to NFD as a pair of characters PATAH and HIRIQ, which
>are then immediately reordered, into HIRIQ then PATAH.
>The canonical exclusion just forbids recombining them
>together into PATAH-HIRIQ.
>
I am aware of this, and that is why I specified in my second posting a
canonical decomposition of hiriq - patah.
>
>So it remains the NFC sequence: <consonnant, hiriq, patah>
>And your proposed character is useless (it becomes a
>compatibility character, not recommended, exactly similar
>to the "Greek Dialitika with Tonos").
>
I take the point. Well, at least it would be specifying a distinct
graphical form, unlike dialitika with tonos. But I accept that there is
actually little to be gained by specifying such characters.
>
>The only way to solve your problem is to make it only a
>compatibility decomposition, which is excluded from NFC
>and NFD decomposition and reordering... This would be,
>I think, the first accepted combining character with a
><compat> decomposition and not a canonical decomposition.
>In addition, the Unicode stability policy would require that
>the defined <compat> decomposition be given in canonical
>order.
>
>Llook for example, the many Arabic <compat> decompositions, ...
>
Which are you referring to? In the Arabic block I can find only four
such decompositions, 0675-0678, and I don't see how the issue here can
be relevant as neither of the components are themselves decomposable. Or
are you talking about the presentation forms? I thought these had to be
compatibility decompositions as there is formatting involved.
>...which could not be made canonical for the simple reason that
>the Unicode policy pact guarantees that the decompositions
>will be defined in canonical order, and only include a character
>pair for canonical decompositions whose second character is
>not canonically decomposable...
>
>-- Philippe.
>
>
>
>
As you got me looking in the Arabic presentation forms, I found an
interesting Arabic rough equivalent of what we might need for Hebrew:
0640, which is not really a letter but just a spacer, but can carry
combining marks, see FCF2-FCF4.
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 18:59:22 EDT