On Thu, 03 Sep 2015 09:32:42 -0700 Rick McGowan <rick@unicode.org> wrote:A proposed update to the LDML specification (UTS #35) will be available for review as of Monday, September 7 at 06:00 GMT. The open review period closes on Monday, September 14 at 06:00 GMT. (This is a short review period, because CLDR 28 is scheduled for release in the week of September 16.) The proposed update will be at http://unicode.org/reports/tr35/proposed.html To report bugs in the specification, please use http://unicode.org/cldr/trac/newticketHave the implications of adding string ranges to Unicode sets been considered? I'm mentioning them on the list because their impact goes beyond locales, and I haven't worked out their implications myself. By my reading, adding string ranges will initially make regular expression engines that don't use ICU non-compliant with Level 1 of UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction and intersection'. I don't imagine the extra work of set operations on Unicode sets containing string ranges will be popular. It may be worst for the minority of regular expression engines that use the regularity of regular expressions. I note that the safety feature of requiring the start and end points to have the same length has been removed from their design.
String ranges seem particularly vulnerable to the ill-effects of unpredictable normalisation.
Richard.
This archive was generated by hypermail 2.2.0 : Mon Sep 07 2015 - 17:13:06 CDT