I was looking the feedback in http://www.unicode.org/review/pri355/, and
didn't see yours there. Could you please file your feedback there? (Nothing
on this list is tracked by the committee...)
FYI, I'm thinking now that the change should be:
GB9c: (Virama | ZWJ ) × LinkingConsonant
=>
GB9c: (Virama ViramaExtend* | ZWJ ) × LinkingConsonant
where ViramaExtend = [Extend - Virama - \p{ccc=0}]
(This is pre-partitioning.)
That is close to your formulation, but for for canonical equivalence, there
shouldn't need to allow the ViramaExtend after ZWJ, because the ZWJ has
ccc=0, and thus nothing reorders around it.
Cibu also pointed out on a different thread that for Malayalam we need to
consider a couple of other forms:
... Following contexts should be allowed for requesting reformed or
traditional conjuncts as per Unicode10.0.0/ch12 page 505. ...
/$L ZWNJ $V $L/
/$L ZWJ $V $L/
The ZWJ Virama sequence is already provided for by the combination of GB9
& GB9c. But not the ZWNJ. If we want to handle that, it would mean the
addition of something like:
GB9d: × (ZWNJ ViramaExtend* Virama)
Cibu also wrote:
Also, when we disallow /$L $V ZWJ $D/, it is disallowing the sequences
involving legacy chillus. That is, for example, <CHILLU N, VOWEL SIGN E> is
a valid sequence (Examples in Unicode10.0.0/ch12 Table 12.36). It's legacy
equivalent would be <NA, VIRAMA, ZWJ, VOWEL SIGN E>. It might be OK to
disallow this; but, we should be mindful of this side effect.
To account for the legacy cases, the simplest approach might be to add
some characters to GCB=
LinkingConsonant
Note:
The final date for deciding exactly what to do with #29 will be in April,
so there is some more time to discuss this. But we have to have a pretty
solid proposal going into that April meeting.
The only test files that we have gotten from India so far include
Devanagari, Malayalam and Bengali. I suspect that the UTC is likely to be
conservative, and limit the GCB=Virama category to just those scripts that
we have test files for
, and that look complete.
Mark
On Mon, Dec 11, 2017 at 2:16 AM, Richard Wordingham via Unicode <
unicode_at_unicode.org> wrote:
> On Sun, 10 Dec 2017 21:14:18 -0800
> Manish Goregaokar via Unicode <unicode_at_unicode.org> wrote:
>
> > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
> >
> > You can also explicitly request ligatureification with a ZWJ, so
> > perhaps this rule should be something like
> >
> > (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant
> >
> > -Manish
> >
> > On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode <
> > unicode_at_unicode.org> wrote:
> >
> > > 1. You make a good point about the GB9c. It should probably instead
> > > be something like:
> > >
> > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
>
> This change is unnecessary. If we start from Draft 1 where there are:
>
> GB9: × (Extend | ZWJ | Virama)
> GB9c: (Virama | ZWJ ) × LinkingConsonant
>
> If the classes used in the rules are to be disjoint, we then have to
> split Extend into something like ViramaExtend and OtherExtend to allow
> normalised (NFC/NFD) text, at which point we may as well continue to
> have rules that work without any normalisation. Informally,
>
> ViramaExtend = Extend and ccc ≠ 0.
>
> OtherExtend = Extend and ccc = 0.
>
> (We might need to put additional characters in ViramaExtend.)
>
> This gives us rules:
>
> GB9': × (OtherExtend | ViramaExtend | ZWJ | Virama)
>
> GB9c': (Virama | ZWJ ) ViramaExtend* × LinkingConsonant
>
> So, for a sequence <virama, ZWJ, nukta, LinkingConsonant>, GB9' gives us
>
> virama × ZWJ × nukta LinkingConsonant
>
> and GB9c' gives us
>
> virama × ZWJ × nukta × LinkingConsonant
>
> ---
> In Rule GB9c, what examples justify including ZWJ? Are they just the C1
> half-forms? My knowledge suggests that
>
> GB9c'': Virama (ZWJ | ViramaExtend)* × LinkingConsonant
>
> might be more appropriate.
>
> Richard.
>
>
Received on Mon Jan 22 2018 - 00:35:00 CST
This archive was generated by hypermail 2.2.0 : Mon Jan 22 2018 - 00:35:01 CST