Re: Grapheme cluster boundaries and left-side spacing dependent vowels

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Apr 22 2003 - 15:55:31 EDT

Next message: Addison Phillips [wM]: "RE: regular expressions with unicode situation?"

Previous message: Mark Davis: "Re: regular expressions with unicode situation?"
In reply to: Kenneth Whistler: "Re: Grapheme cluster boundaries and left-side spacing dependent vowels"
Next in thread: Jungshik Shin: "Re: Grapheme cluster boundaries and left-side spacing dependent vowels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

To add on to what Ken has said, what UAX #29 does is define default grapheme
cluster boundaries. While these form a well-defined core which can be very
useful in language-independent processing, for particular languages a
tailored grapheme cluster may be more useful, consisting of one or more
default grapheme clusters. Examples of this are given in UAX #29.

Mark
(مرقص بن داود)
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Kenneth Whistler" <kenw@sybase.com>
To: <Peter_Constable@sil.org>
Cc: <unicode@unicode.org>; <kenw@sybase.com>
Sent: Tuesday, April 22, 2003 11:45
Subject: Re: Grapheme cluster boundaries and left-side spacing dependent
vowels

> Peter Constable wrote:
>
> > Jungshik Shin wrote on 04/21/2003 09:27:04 PM:
> >
> > > I think two cases are distinct. In bidi text, bouncing back and
forth
> > > is across grapheme boundaries while in what James described, it's
> > > within a single grapheme.
> >
> > Well, wasn't the point of James' comments: to determine whether the
Indic
> > sequences *should* be considered a grapheme?
>
> It's up to implementations, applications, and graphologists to
> decide.
>
> The UTC made a brief foray onto the unforgiving ground of trying
> to determine grapheme status and grapheme boundaries, but after
> wrestling with the issue of trying to define "unithood" inside
> Indic orthographic syllables, backed off again.
>
> UAX #29 now has a very streamlined definition of "default
> grapheme cluster boundaries" which basically amounts to
> trying to keep boundaries from falling within sequences of
> base letters + non-spacing marks or within sequences of
> jamos that constitute a Korean syllable. That's it.
> UAX #29 default grapheme cluster boundaries don't even attempt
> to specify whether Devanagari consonant conjuncts, or
> akshara's, or orthographic syllables, or Indic constructs involving
> vowels behaving as chunks of conjunct forms, or whatnot constitute
> graphemes. Such determinations are basically out-of-scope for
> Unicode, in my opinion.
>
> --Ken
>
>
>

Next message: Addison Phillips [wM]: "RE: regular expressions with unicode situation?"
Previous message: Mark Davis: "Re: regular expressions with unicode situation?"
In reply to: Kenneth Whistler: "Re: Grapheme cluster boundaries and left-side spacing dependent vowels"
Next in thread: Jungshik Shin: "Re: Grapheme cluster boundaries and left-side spacing dependent vowels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 16:34:20 EDT