From: Mark Davis ⌛ (mark@macchiato.com)
Date: Thu Aug 27 2009 - 19:44:11 CDT
The key is the statement "*Ignore Format and Extend characters, except when
they appear at the beginning of a region of text. (See Section 6.2, Replacing
Ignore Rules<http://www.unicode.org/reports/tr29/tr29-14.html#Grapheme_Cluster_and_Format_Rules>
.)***"
If you do follow the link to 6.2, it explicitly says:
*
** Replace the “Ignore” rule by the following, to disallow breaks within
sequences (except after CRLF and related characters):* [My bolding]
*
*So it is not meant to apply to the sequence CR Extend.*
*The phrase *"beginning of a region of text"* should be clarified, because
it is intended not to span a ÷ boundary introduced by previous rules (in
this case, S1-S3). I'll run this by the ed committee to see if an editorial
change is in order.
Mark
(I think Eric M might have been using an earlier version, because his
numbering was off. I'm looking at
http://www.unicode.org/reports/tr29/tr29-14.html)
On Thu, Aug 27, 2009 at 17:09, Eric Muller <emuller@adobe.com> wrote:
> Mark Davis ⌛ wrote:
>
>> Because SB5 doesn't match at the start, you keep on going through the
>> rules, and finally end up not breaking.
>>
>
> You are saying that SB11 does not apply on <lower lower aterm close lf ext
> ? upper lower lower> to use your example (with ? being the position
> examined).
>
> I just don't see how that can be inferred from the text. The rules in 6.2
> that describe the insertion of "(Extend|Format)*" do not say "insert only in
> some cases and not in others".
>
> On the other hand, looking at the rules in the CLDR (for root), I see that
> the variables Sep, CR and LF are not extended with $FE* like all the others.
> Hence my sudden doubt.
>
> I agree that the wording is not as clear as it should be.
>>
>
> Especially if it doesn't say what you want it to say ;-)
>
> Thanks,
> Eric.
>
>
This archive was generated by hypermail 2.1.5 : Thu Aug 27 2009 - 19:46:11 CDT