From: jon@hackcraft.net
Date: Fri Dec 12 2003 - 08:43:53 EST
Quoting Peter Kirk <peterkirk@qaya.org>:
[snip me quoting D17a]
> >
> >"in some way defective" is actually a good way to put it methinks, they
> aren't
> >illegal, and in some cases you can do things with them that are both
> reasonable
> >and useful, but in other situations they may be problematic.
> >
> >
> >
> >
> Indeed. But I was thinking more in terms of grapheme clusters, as
> defined in UAX #29. Is a defective combining sequence a grapheme
> cluster? Probably not according to the definition "what the user thinks
> of as a character or basic unit of the language". But the boundary rule
> "/Break at the start and end of text./" implies that the algorithm will
> count a defective combining sequence at the start of text (and possibly
> what follows) as a default grapheme cluster. So it is "in some way
> defective" as a grapheme cluster as well as as a character sequence.
My understanding is that it would be counted, but I agree it doesn't
match "what the user thinks of as a character" very well. So it's a grapheme
cluster, but it's "in some way defective" :)
> I note the following in UAX #29, which backs up my comments on functions
> to count characters:
>
> > In those rare circumstances where end-users need character counts, the
> > counts should correspond to the grapheme cluster boundaries.
>
> This implies that end users should not require counts of code units or
> code points.
I don't think anyone argued against this being what *end* users require.
Certainly for small values of "end" anyway.
-- Jon Hanna | Toys and books <http://www.hackcraft.net/> | for hospitals: | <http://santa.boards.ie>
This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 09:27:53 EST