From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 28 2010 - 16:09:11 CDT
> Message du 26/07/10 18:45
> De : "Markus Scherer" <markus.icu@gmail.com>
> A : verdy_p@wanadoo.fr
> Copie à : "Unicode Mailing List" <unicode@unicode.org>
> Objet : Re: Using Combining Double Breve and expressing characters perhaps as if struck out.
>
> There are 857 combining marks with combining class of 0:
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:M:]%26[:ccc%3D0:]]&abb=on&g=
>
> On Sat, Jul 24, 2010 at 11:25 AM, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
>
> > "Kent Karlsson" <kent.karlsson14@telia.com> wrote:
> > > Den 2010-07-24 10.07, skrev "Philippe Verdy" <verdy_p@wanadoo.fr>:
> > >
> > > > Double diacritics have a combining property equal to zero, so they
> > >
> > > No, they don't. The above ones have combining class 234 and the below
> > > ones have combining class 233 (other characters with the word DOUBLE
> > > in them are 'double' in some other way):
> > >
> > > 035C;COMBINING DOUBLE BREVE BELOW;Mn;233;NSM;;;;;N;;;;;
> > > ...
> >
> > Aren't they using the maximum value of the combining class ?
>
>
> No.
>
> > If so,
> > you can still use double diacritics betweeb two sequences containing a
> > base character and any "simple" diacritic, and be sure that the double
> > diacritic will be rendered about them, as it will remain in the last
> > position of the normalized form.
> >
>
> No. The order of combining marks only determines their rendering order if
> they have the same combining class value. If they have different values,
> then their rendering is supposed to be independent of their order in the
> text. The canonical ordering in normalization only serves processing such as
> string comparisons.
You've not understood what I wanted to say.
I know what you explain, but double diacritics can only be reordered
in one case: if there's an upper double diacritic occuring before a
lower diacritic (in which case the normalization will reorder it; as
there's no visible difference in the result, this reordering is safe,
and CGJ is not required to protect it).
But given the way they will be encoded only between base graphemes,
there's no risk that they can be swapped by normalization or that thy
could be ordered BEFORE non-double diacritics.
We can perfectly expect that sequences encoded with double diacritics
will only be in that order:
- <prependers for base 1, base 1, other simple diacritics or extenders
for base 1 only>, then
- <lower double diacritics, upper double diacritics>, then
- <prependers for base 2, base 2, other simple diacritics or extendrs
for base 2 only>
That's what I said in sayin that they have the MAXIMUM combining class
value. There's also NO risk that stacking double diacritics will be
reordered within the same position, so that that use, you will never
need CGJ.
CGJ will only be needed if you want to append a non-double diacritic
on top of a double, but given that this double diacritic shold not
apply to the double diacritic itself, but to the whole group of base
graphemes "joined" by the double diacritics, these additional
non-double diacritics should be encoded AFTER this whole group, i.e.
just after:
- <prependers for base 2, base 2, other simple diacritics or extendrs
for base 2 only>,
if we really want to respect the logical encoding order.
And for this use, CGJ will be incorrect (because the additional
diacritics will STILL be part of the base grapheme cluster 2).
We need something else, and that's were will need ZWJ instead, as the
holder of additional diacritics that should stack on the whole group.
OK you may avoid this problem by using CGJ immediately after the
double diacritics (i.e. also before base grapheme cluster 2), but this
will remain illogical.
Well, even the double diacritics themselves are a hack in Unicode.
Ideally we should not even need them, and instead of using:
- <o, DOUBLE BREVE, o>
This should be:
- <o, ZWJ, o, ZWJ, BREVE>
Now you can see the problem: ZWJ has never been designed to create
structured layout groups, when used alone.
If layout structire grouping is needed however, we could use variation
selectors to qualify the ZWJ:
- <o, ZWJ, VS1, o, ZWJ, VS1, BREVE>
where the variation sequence <ZWJ,VS1> would mean here : horizontal
group level 1.
And so, we could encode the logicial layout structures of Hieroglyphs
(that require multiple levels, both horizontally, and vertically) by
defining these variation sequences:
HGROUP1 = <ZWJ,VS1>
VGROUP1 = <ZWJ,VS2>
HGROUP2 = <ZWJ,VS3>
VGROUP2 = <ZWJ,VS4>
HGROUP3 = <ZWJ,VS5>
VGROUP3 = <ZWJ,VS6>
and so on...
With this definition, then we no longer need ANY double diacritic
variants, we just use the standard diacritics:
- <o, HGROUP1, o, HGROUP1, BREVE>
instead of the "deprecated" method using :
- <o, DOUBLE BREVE, o>
(which won't be canonically equivalent, but does it matter ?). And we
gain a consistant encoding for "triple" diacritics or longer:
- <o, HGROUP1, o, HGROUP1, o HGROUP1, BREVE>
which represents a single BREVE over an horizontal grouping of three <o>.
And with the same tool, we can almost completely encode as well the
Egyptian hieroglyphs. This could even be part of the standard
character encoding model !
Philippe.
This archive was generated by hypermail 2.1.5 : Wed Jul 28 2010 - 16:10:27 CDT