CGJ

From: Jony Rosenne (rosennej@qsm.co.il)
Date: Sat Oct 25 2003 - 08:33:07 CST


For the record, I repeat that I am not convinced that the CGJ is an
appropriate solution for the problems associated with the right Meteg. I
tend to think we need a separate character.

Jony

> -----Original Message-----
> From: unicode-bounce@unicode.org
> [mailto:unicode-bounce@unicode.org] On Behalf Of Philippe Verdy
> Sent: Saturday, October 25, 2003 1:12 PM
> To: Peter Kirk
> Cc: unicode@unicode.org
> Subject: Re: New contribution N2676
>
>
> From: "Peter Kirk" <peterkirk@qaya.org>
> > Have combining classes actually been defined for these characters?
> >
> > This is of course exactly the same problem as with Hebrew
> vowel points
> > and accents, except that this time it applies to real living
> > languages. Perhaps it is time to do something about these combining
> > classes which conflict with the standard.
>
> Do you mean officially documenting the correct (and strict)
> use of CGJ as the only way to bypass the default order
> required by the combining classes in normalized forms? It
> would be a good idea to document officially which use of CGJ
> is superfluous and should be avoided in NF forms, and which
> use is required.
>
> 1) This will affect only the input methods for those
> languages that need to "swap" the standard order of combining
> characters to keep their logical order (all this will require
> is a additional input control that will try swapping
> ambiguous orders).
>
> 2) A complete documentation may need to specify which pairs
> of combining characters are affected (this should list the
> pairs of combining characters <c1, c2> where CC(c1) > CC(c2)
> and that require to be encoded <c1, CGJ, c2> to be kept in
> logical order, as the sequence <c1, c2> will be reordered
> into <c2, c1> in normalized forms.
>
> 3) The other issue would be that there may exist other
> combining characters than those in this pair. Suppose I want
> to represent <base, c1, c2, c3>, where CC(c1) > CC(c2), but
> c3 does not have a conflicting pair in the previous list.
> Should it be encoded as <base, c1, CGJ, c2, c3> or as <base,
> c1, c3, CGJ, c2>? As the standard normalization algorithm
> cannot be changed, both sequences will be possible with the
> NF forms, even though they represent the same character.
>
> One could design an extra normalization step to force one
> interpretation (so that only combining characters with
> conflicting combining classes that have been forced "swapped"
> will appear after CGJ, all other diacritics being encoded
> preferably in the first sequence before the CGJ).
>
> This extra step should not be part of the NF forms (because
> Unicode states that normailzed forms will be kept normalized
> in all further versions of Unicode), but this could be named
> differently, by describing a system in which extra
> normalization steps may be applied that may change NF forms
> into other "equivalent" sequences also in normalized form.
>
>
>
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST