On Sun, 10 Dec 2017 21:14:18 -0800
Manish Goregaokar via Unicode <unicode_at_unicode.org> wrote:
> > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
>
> You can also explicitly request ligatureification with a ZWJ, so
> perhaps this rule should be something like
>
> (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant
>
> -Manish
>
> On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode <
> unicode_at_unicode.org> wrote:
>
> > 1. You make a good point about the GB9c. It should probably instead
> > be something like:
> >
> > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
This change is unnecessary. If we start from Draft 1 where there are:
GB9: × (Extend | ZWJ | Virama)
GB9c: (Virama | ZWJ ) × LinkingConsonant
If the classes used in the rules are to be disjoint, we then have to
split Extend into something like ViramaExtend and OtherExtend to allow
normalised (NFC/NFD) text, at which point we may as well continue to
have rules that work without any normalisation. Informally,
ViramaExtend = Extend and ccc ≠ 0.
OtherExtend = Extend and ccc = 0.
(We might need to put additional characters in ViramaExtend.)
This gives us rules:
GB9': × (OtherExtend | ViramaExtend | ZWJ | Virama)
GB9c': (Virama | ZWJ ) ViramaExtend* × LinkingConsonant
So, for a sequence <virama, ZWJ, nukta, LinkingConsonant>, GB9' gives us
virama × ZWJ × nukta LinkingConsonant
and GB9c' gives us
virama × ZWJ × nukta × LinkingConsonant
--- In Rule GB9c, what examples justify including ZWJ? Are they just the C1 half-forms? My knowledge suggests that GB9c'': Virama (ZWJ | ViramaExtend)* × LinkingConsonant might be more appropriate. Richard.Received on Mon Dec 11 2017 - 04:17:10 CST
This archive was generated by hypermail 2.2.0 : Mon Dec 11 2017 - 04:17:12 CST