I have a few questions/comments about the new emoji segmentation rules in 9.0.0
1. I have trouble understanding what the ^ symbol means in these rules:
http://www.unicode.org/reports/tr29/proposed.html#GB8a
http://www.unicode.org/reports/tr29/proposed.html#WB15
does it correspond to the regexp SOL symbol ? If that is the case SOL is a bit ambiguous in that context it could also mean that you need to match start of lines which is a whole different business. Couldn't that simply be replaced by sot ?
2. Besides given that with GB8* rules you need to be able to count an odd number of RI, it seems to me that the sentence "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters." is no longer accurate.
3. There are two rules named GB8c.
4. In §1.1 the link to UTS18 is broken (#RegEx does not exist in UAX 41).
Best,
Daniel
Received on Tue Jun 21 2016 - 11:03:10 CDT
This archive was generated by hypermail 2.2.0 : Tue Jun 21 2016 - 11:03:10 CDT