From: Mark Davis (mark.davis@jtcsv.com)
Date: Sat Nov 08 2003 - 23:29:44 EST
You are stating many things as if they were facts, when they are simply not
true. You should verify them against the definitions before stating them in such
a 'definitive' way.
Examples:
- VS1 is a combining character, and not a base character.
http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?ch=FE00
- Default grapheme clusters do not include ZWJ; as a matter of fact, default
grapheme clusters, except for Hangul Jamo Syllables and a few exceptional cases,
are identical with combining sequences.
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
- *Tailored* grapheme clusters may include longer sequences, but it is not at
all obvious whether they would contain ever ZWJ or ZWNJ.
>...rendering of text works on grapheme clusters
- Rendering units are, in general, orthogonal to whether a sequence is a
grapheme cluster or not. "fi" may be a ligature in English, but is certainly not
a grapheme cluster.
Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄
----- Original Message -----
From: "Philippe Verdy" <verdy_p@wanadoo.fr>
To: "Peter Kirk" <peterkirk@qaya.org>
Cc: <unicode@unicode.org>
Sent: Sat, 2003 Nov 08 17:15
Subject: Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination
> I'm curious about what name you would give to it.
> The name COMBINING CHARACTER JOINER is already used...
>
> In all our discussions we should have used the term "starter" (instead of
> just "base character" which is ambiguous) for any characters of combining
> class 0 and which include:
>
> Base characters (includes conjoining characters):
> letter, syllable or ideograph (gc=L*),
> number (gc=N*),
> punctuation (gc=P*),
> symbol (gc=S*),
> space (gc=Zs)
> agreed private use characters (gc=Co and private agreement)
> Starter Combining characters:
> (gc=M* and CC=0) such as CGJ
> Controls:
> (gc=C* except Co),
> Text separators:
> (gc=Zl, Zp)
> Unknown private use characters:
> (gc=Co and no private agreement)
>
> For other characters with combining class > 0, we should have used the term
> "non-starter", not the term "combining character" which may or may not be a
> "starter".
>
> It is clear however that we made a distinction between "combining sequences"
> (made of a unique starter and optionally followed by non-starters) and
> "grapheme clusters" (which are made of one or more combining sequences). For
> example, the (hypothetic) encoded text:
>
> <ALEF, ZWJ, LAMED, VAV, VS1, HOLAM, NUN, METEG, CGJ, HATAF PATAH>
>
> is made of 7 "combining sequences":
>
> <ALEF>,
> <ZWJ>,
> <LAMED,
> <VAV>,
> <VS1, HOLAM>,
> <NUN, HATAF PATAH>,
> <CGJ, METEG>
>
> (where the starters are VAV, VS1, NUN, CGJ),
> and 3 "grapheme clusters":
>
> <ALEF, ZWJ, LAMED,
> <VAV, VS1, HOLAM>,
> <NUN, HATAF PATAH, CGJ, METEG>
>
> (ZWJ is a format control and ignored in the determination of grapheme
> cluster boundaries).
>
> Grapheme clusters may be created by grouping several combining sequences
> without using CGJ, ZWJ, ZWNJ, or variant selectors: see examples in South
> Asian scripts, and with Hangul Jamos.
>
> Generally, collation and rendering of text works on grapheme clusters (or
> groups of these clusters with language-specific tailoring); but not on
> combining sequences whose role is either related to string identity
> excluding any concept of relative order (i.e. normalization and canonical
> equivalence), or to text transforms or folding.
>
> Compatibility equivalence is also defined but neither on combining
> sequences, nor on grapheme clusters: there may be a mapping from one
> character (i.e. only a part of a combining sequence) to several characters
> that belong to distinct combining sequences and distinct grapheme clusters,
> for example with some ligatures of base letters (example: the "ffi"
> ligature, which participates to only 1 combining sequence and only 1
> grapheme cluster, is mapped to 3 distinct combining sequences and 3 distinct
> grapheme clusters).
>
> ----- Original Message -----
> From: "Peter Kirk" <peterkirk@qaya.org>
> To: <hebrew@unicode.org>
> Sent: Sunday, November 09, 2003 1:20 AM
> Subject: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination
>
>
> > So that you don't hold try to your breath over the weekend to find out
> > what I am planning to propose, as announced on the main Unicode list...
> >
> > The issue in question is the ligation of hataf vowels and meteg. Hataf
> > vowels with medial meteg are clear cases of ligatures between the basic
> > vowels and meteg. But there seems to be no mechanism in Unicode so far
> > to promote such a ligature. So, my suggestion is to propose a new
> > combining character COMBINING CHARACTER JOINER (combining class zero),
> > defined with semantics similar to ZWJ rather than CGJ i.e. to affect
> > ligation but not collation.
> >
> > Comments?
> >
> > --
> > Peter Kirk
> > peter@qaya.org (personal)
> > peterkirk@qaya.org (work)
> > http://www.qaya.org/
> >
> >
> >
>
>
>
This archive was generated by hypermail 2.1.5 : Sun Nov 09 2003 - 00:11:21 EST