From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Apr 22 2003 - 14:45:54 EDT
Peter Constable wrote:
> Jungshik Shin wrote on 04/21/2003 09:27:04 PM:
>
> > I think two cases are distinct. In bidi text, bouncing back and forth
> > is across grapheme boundaries while in what James described, it's
> > within a single grapheme.
>
> Well, wasn't the point of James' comments: to determine whether the Indic
> sequences *should* be considered a grapheme?
It's up to implementations, applications, and graphologists to
decide.
The UTC made a brief foray onto the unforgiving ground of trying
to determine grapheme status and grapheme boundaries, but after
wrestling with the issue of trying to define "unithood" inside
Indic orthographic syllables, backed off again.
UAX #29 now has a very streamlined definition of "default
grapheme cluster boundaries" which basically amounts to
trying to keep boundaries from falling within sequences of
base letters + non-spacing marks or within sequences of
jamos that constitute a Korean syllable. That's it.
UAX #29 default grapheme cluster boundaries don't even attempt
to specify whether Devanagari consonant conjuncts, or
akshara's, or orthographic syllables, or Indic constructs involving
vowels behaving as chunks of conjunct forms, or whatnot constitute
graphemes. Such determinations are basically out-of-scope for
Unicode, in my opinion.
--Ken
This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 16:02:02 EDT