From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun Nov 09 2003 - 14:11:58 EST
Let's try to be clear on the terms.
Look at the definition of combining sequences:
D17 Combining character sequence: A character sequence consisting of either a
base character followed by a sequence of one or more combining characters, or a
sequence of one or more combining characters.
Thus a combining character sequence *cannot* contain a ZWJ or any other Cf.
Any use of a ZWJ before a combining mark produces a *defective* combining
character sequence (D17a), which isolates the combining mark from any preceeding
base character.
And as I said earlier:
> - *Default* grapheme clusters do not include ZWJ; as a matter of fact, default
> grapheme clusters, except for Hangul Jamo Syllables and a few exceptional
cases,
> are identical with combining sequences.
> http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
> - *Tailored* grapheme clusters may include longer sequences, but it is not at
> all obvious whether they would contain ever ZWJ or ZWNJ.
I'll expand on the latter. What constitutes a tailored grapheme cluster is up to
a particular process, and so one could contain a ZWJ. However, any combining
mark after a ZWJ does *not* apply to a previous base character within that
tailored grapheme cluster, so the use of a ZWJ would isolate that combining
mark. Such a sequence would not correspond to anything used in a natural
language.
Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄
----- Original Message -----
From: "Peter Kirk" <peterkirk@qaya.org>
To: "Mark Davis" <mark.davis@jtcsv.com>
Cc: "Unicode List" <unicode@unicode.org>
Sent: Sun, 2003 Nov 09 09:19
Subject: Re: ZWJ, ZWNJ, CGJ and combination
> On 08/11/2003 17:09, Mark Davis wrote:
>
> >I agree with the first part of your analysis. By the phrase "requesting
ligation
> >of combining characters" it is unclear to me what you mean, and whether that
is
> >the right solution to whatever problem you are referring to.
> >
> >Mark
> >__________________________________
> >http://www.macchiato.com
> >► शिष्यादिच्छेत्पराजयम् ◄
> >
> >
> >
> A further reply to this one:
>
> On the bidi list Paul Nelson pointed out that in Khmer ZWJ and ZWNJ do
> not break combining sequences; or at least they do not break grapheme
> clusters, which is not quite the same thing. And the same may be true of
> Indic scripts, although in the examples I found ZWJ/ZWNJ is always at
> the end of a combining sequence. Are ZWJ and ZWNJ actually used within
> combining character sequences (or what would be such sequences if not
> technically broken)? Is there some tension here with the general
> definition of combining character sequences?
>
> If Khmer really does do this, and unless there are any real objections
> to this practice, perhaps the best way ahead, rather than defining a new
> COMBINING CHARACTER JOINER and changing the Khmer encoding, is to adjust
> the definition of combining character sequences to allow ZWJ, ZWNJ and
> perhaps some other suitable layout control characters to be included
> within such sequences. This would allow the Hebrew issue to be solved in
> a way analogous to the Khmer issue.
>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>
>
>
This archive was generated by hypermail 2.1.5 : Sun Nov 09 2003 - 14:57:36 EST