Re: New Public Review Issue: Proposed Update UTS #18

From: Mike (mike-list@pobox.com)
Date: Tue Sep 25 2007 - 13:38:48 CDT

Next message: Marnen Laibow-Koser: "Re: New Public Review Issue: Proposed Update UTS #18"

Previous message: Jon Hanna: "Re: Marks"
In reply to: Mark Davis: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Having named character sequences in \N is an interesting idea. Would you
> mind proposing that to the UTC using the online form? (That's the way to
> raise issues to the UTC's attention.)

Done.

> BTW, Andy and I concluded that the really effective way to do canonical
> equivalence in regex would be in a mode where grapheme cluster is the
> unit, not code point.

I'm starting to think that we may need to support both modes.

> On the comment on "feasible" -- I think the reference there was to
> language/locale-sensitive regex. That involves a few things which are
> quite tricky, and are thus listed under Level 3 in UTS#18.
>
> * sensitivity: "aa" matches a-ring in Danish
> * language-sensitive ordering ranges: [a-z] doesn't include o-slash
> in Danish
> * language-sensitive grapheme clusters: a dot matches "ch" in Slovak
> * ...
>
> Few implementations try to handle locale-sensitivity except for POSIX
> (and that has significant problems in it). I wouldn't say that they are
> infeasible, but they are tricky.

Lots of programming problems are tricky. If I just gave up every
time I ran into a tricky problem, my software wouldn't be very
useful....

Being able to match grapheme clusters in regular expressions is
a requirement for level 2 conformance, so I'm just trying to be
compliant here. If I can also figure out how to make "." match
the grapheme clusters a user specifies, such as "ch", what is
wrong with that?

Mike

Next message: Marnen Laibow-Koser: "Re: New Public Review Issue: Proposed Update UTS #18"
Previous message: Jon Hanna: "Re: Marks"
In reply to: Mark Davis: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Sep 25 2007 - 13:42:04 CDT