Re: New Public Review Issue: Proposed Update UTS #18

From: Mike (mike-list@pobox.com)
Date: Mon Sep 24 2007 - 10:25:37 CDT

Next message: Gerrit Sangel: "Composition of not included Chinese characters"

Previous message: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
In reply to: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Mark Davis: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Mark Davis: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> I don't think it will ever really be feasible to define regular
> expressions in terms of specific languages, to the point of treating
> combinations of two or more base characters as a single matchable
> "character" on the basis that speakers of language X consider the
> combination to be a single "letter."

It is feasible, and I already have working code.

There is no avoiding it. Consider: [\uAC00-\uD7A3] which should
match any LV or LVT Hangul syllable. That character class needs
to be able to match any of the precomposed characters listed in
the range, but also must match any sequence of jamos that is
canonically equivalent, such as <U+1103 U+1167 U+11AB>.

The specification uses as an example, [a-z\q{x\u0323}], which
allows American Indians to treat x with an under dot as a single
character even though there is no precomposed character for it.

I also allow you to put named character sequences in a character
class: [\N{KATAKANA LETTER AINU P}] and they always consist of
multiple code points, by definition.

Mike

Next message: Gerrit Sangel: "Composition of not included Chinese characters"
Previous message: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
In reply to: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Mark Davis: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Mark Davis: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Doug Ewell: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Sep 24 2007 - 10:28:55 CDT