Re: Grapheme cluster boundaries and left-side spacing dependent vowels

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Apr 22 2003 - 14:45:54 EDT

  • Next message: Ben Dougall: "regular expressions with unicode situation?"

    Peter Constable wrote:

    > Jungshik Shin wrote on 04/21/2003 09:27:04 PM:
    >
    > > I think two cases are distinct. In bidi text, bouncing back and forth
    > > is across grapheme boundaries while in what James described, it's
    > > within a single grapheme.
    >
    > Well, wasn't the point of James' comments: to determine whether the Indic
    > sequences *should* be considered a grapheme?

    It's up to implementations, applications, and graphologists to
    decide.

    The UTC made a brief foray onto the unforgiving ground of trying
    to determine grapheme status and grapheme boundaries, but after
    wrestling with the issue of trying to define "unithood" inside
    Indic orthographic syllables, backed off again.

    UAX #29 now has a very streamlined definition of "default
    grapheme cluster boundaries" which basically amounts to
    trying to keep boundaries from falling within sequences of
    base letters + non-spacing marks or within sequences of
    jamos that constitute a Korean syllable. That's it.
    UAX #29 default grapheme cluster boundaries don't even attempt
    to specify whether Devanagari consonant conjuncts, or
    akshara's, or orthographic syllables, or Indic constructs involving
    vowels behaving as chunks of conjunct forms, or whatnot constitute
    graphemes. Such determinations are basically out-of-scope for
    Unicode, in my opinion.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 16:02:02 EDT