From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 08 2003 - 20:15:13 EST
I'm curious about what name you would give to it.
The name COMBINING CHARACTER JOINER is already used...
In all our discussions we should have used the term "starter" (instead of
just "base character" which is ambiguous) for any characters of combining
class 0 and which include:
Base characters (includes conjoining characters):
letter, syllable or ideograph (gc=L*),
number (gc=N*),
punctuation (gc=P*),
symbol (gc=S*),
space (gc=Zs)
agreed private use characters (gc=Co and private agreement)
Starter Combining characters:
(gc=M* and CC=0) such as CGJ
Controls:
(gc=C* except Co),
Text separators:
(gc=Zl, Zp)
Unknown private use characters:
(gc=Co and no private agreement)
For other characters with combining class > 0, we should have used the term
"non-starter", not the term "combining character" which may or may not be a
"starter".
It is clear however that we made a distinction between "combining sequences"
(made of a unique starter and optionally followed by non-starters) and
"grapheme clusters" (which are made of one or more combining sequences). For
example, the (hypothetic) encoded text:
<ALEF, ZWJ, LAMED, VAV, VS1, HOLAM, NUN, METEG, CGJ, HATAF PATAH>
is made of 7 "combining sequences":
<ALEF>,
<ZWJ>,
<LAMED,
<VAV>,
<VS1, HOLAM>,
<NUN, HATAF PATAH>,
<CGJ, METEG>
(where the starters are VAV, VS1, NUN, CGJ),
and 3 "grapheme clusters":
<ALEF, ZWJ, LAMED,
<VAV, VS1, HOLAM>,
<NUN, HATAF PATAH, CGJ, METEG>
(ZWJ is a format control and ignored in the determination of grapheme
cluster boundaries).
Grapheme clusters may be created by grouping several combining sequences
without using CGJ, ZWJ, ZWNJ, or variant selectors: see examples in South
Asian scripts, and with Hangul Jamos.
Generally, collation and rendering of text works on grapheme clusters (or
groups of these clusters with language-specific tailoring); but not on
combining sequences whose role is either related to string identity
excluding any concept of relative order (i.e. normalization and canonical
equivalence), or to text transforms or folding.
Compatibility equivalence is also defined but neither on combining
sequences, nor on grapheme clusters: there may be a mapping from one
character (i.e. only a part of a combining sequence) to several characters
that belong to distinct combining sequences and distinct grapheme clusters,
for example with some ligatures of base letters (example: the "ffi"
ligature, which participates to only 1 combining sequence and only 1
grapheme cluster, is mapped to 3 distinct combining sequences and 3 distinct
grapheme clusters).
----- Original Message -----
From: "Peter Kirk" <peterkirk@qaya.org>
To: <hebrew@unicode.org>
Sent: Sunday, November 09, 2003 1:20 AM
Subject: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination
> So that you don't hold try to your breath over the weekend to find out
> what I am planning to propose, as announced on the main Unicode list...
>
> The issue in question is the ligation of hataf vowels and meteg. Hataf
> vowels with medial meteg are clear cases of ligatures between the basic
> vowels and meteg. But there seems to be no mechanism in Unicode so far
> to promote such a ligature. So, my suggestion is to propose a new
> combining character COMBINING CHARACTER JOINER (combining class zero),
> defined with semantics similar to ZWJ rather than CGJ i.e. to affect
> ligation but not collation.
>
> Comments?
>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>
>
>
This archive was generated by hypermail 2.1.5 : Sat Nov 08 2003 - 21:01:56 EST