UAX 29 questions

Philippe Verdy verdy_p at
Fri Jan 30 00:36:03 CST 2015

The main reason is that the rest if the text does not test pairs starting
by Format or Extend, but Any character that precedes the Format and Extend
By saying "ignore"; it just says : whilae parsing from start to ed of text,
keep any character in the stqte variable that keeps the WB-property of the
non-ignored character.
:In fact the rule is :
Any  × (Format | Extend)+
but this is matches more than a simpke pair of characters. Used in all the
rest of the rules.
So effectively all other rules do not contain any reference to Format and

2015-01-30 6:25 GMT+01:00 Karl Williamson <public at>:

> On 01/29/2015 08:19 PM, Philippe Verdy wrote:
>> 2015-01-29 19:52 GMT+01:00 Karl Williamson <public at
>> <mailto:public at>>:
>>     Rule WB4 is
>>     "Ignore Format and Extend characters, except when they appear at the
>>     beginning of a region of text.".
>>     Not clearly stated, but it appears to me that the ZWJ must be
>>     considered here to be the beginning of a region of text, as we are
>>     looking at the boundary between it and the "A".  No rule
>>     specifically mentions ALetter followed by an Extend, so by the
>>     default rule, WB14
>>     "Otherwise, break everywhere (including around ideographs)"
>> All the text is targeted at finding candidate positions for breaks. It
>> is not very clear that "ignore" is definitive and means that there
>> cannot be any further breaks before the Format and Extend characters,
>> except at beginng of text. So all the rest of rules is ignored, there
>> was a match and you stop there; no break before;
>>    Any  × (Format | Extend)
>> This is confirmed in other rules that state the word "otherwise",
>> including the last one (WB14) you quote which is explciitly not
>> applicable.
> I don't understand you here.  I understand all the words, but I don't see
> what you're trying to say.  My claim is that there should be a rule:
> as you give
>  Any  × (Format | Extend)
> but there isn't.  I think you are maybe trying to say that the word
> "ignore" in this UAX is tantamount to such a rule.  I am a native English
> speaker, and would never have drawn that inference from the text.  There
> are a lot of passages in the Standard that sound like gibberish to me.  I
> know the words' meanings, but the combination don't make any sense.  I
> don't recall ever having this issue in other standards I've looked at.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list