Re: UAX 29 questions

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 30 Jan 2015 07:36:03 +0100

The main reason is that the rest if the text does not test pairs starting
by Format or Extend, but Any character that precedes the Format and Extend
characters.
By saying "ignore"; it just says : whilae parsing from start to ed of text,
keep any character in the stqte variable that keeps the WB-property of the
non-ignored character.
:In fact the rule is :
Any × (Format | Extend)+
but this is matches more than a simpke pair of characters. Used in all the
rest of the rules.
So effectively all other rules do not contain any reference to Format and
Extend.

2015-01-30 6:25 GMT+01:00 Karl Williamson <public_at_khwilliamson.com>:

> On 01/29/2015 08:19 PM, Philippe Verdy wrote:
>
>>
>> 2015-01-29 19:52 GMT+01:00 Karl Williamson <public_at_khwilliamson.com
>> <mailto:public_at_khwilliamson.com>>:
>>
>> Rule WB4 is
>>
>> "Ignore Format and Extend characters, except when they appear at the
>> beginning of a region of text.".
>>
>> Not clearly stated, but it appears to me that the ZWJ must be
>> considered here to be the beginning of a region of text, as we are
>> looking at the boundary between it and the "A". No rule
>> specifically mentions ALetter followed by an Extend, so by the
>> default rule, WB14
>>
>> "Otherwise, break everywhere (including around ideographs)"
>>
>>
>> All the text is targeted at finding candidate positions for breaks. It
>> is not very clear that "ignore" is definitive and means that there
>> cannot be any further breaks before the Format and Extend characters,
>> except at beginng of text. So all the rest of rules is ignored, there
>> was a match and you stop there; no break before;
>>
>> Any × (Format | Extend)
>>
>> This is confirmed in other rules that state the word "otherwise",
>> including the last one (WB14) you quote which is explciitly not
>> applicable.
>>
>
> I don't understand you here. I understand all the words, but I don't see
> what you're trying to say. My claim is that there should be a rule:
> as you give
>
> Any × (Format | Extend)
>
> but there isn't. I think you are maybe trying to say that the word
> "ignore" in this UAX is tantamount to such a rule. I am a native English
> speaker, and would never have drawn that inference from the text. There
> are a lot of passages in the Standard that sound like gibberish to me. I
> know the words' meanings, but the combination don't make any sense. I
> don't recall ever having this issue in other standards I've looked at.
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri Jan 30 2015 - 00:37:46 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 30 2015 - 00:37:46 CST