L2/07-383
Date: Mon, 15 Oct 2007 13:18:58 -0700
From: Mark Davis
Subject: Small change to UAX#29
======
I was looking over #29, and realized that there is an issue
in SB11. With the latest change, we pulled out CR and LF explicitly, so we
have:
SB11. |
( STerm | ATerm )
Close* Sp* ( Sep | CR | LF )? |
÷ |
|
At first, I thought this was a problem, because we do have CRLF, and so
should include that, giving the modified:
SB11. |
( STerm | ATerm )
Close* Sp* ( Sep | CR | LF |
CR LF )? |
÷ |
|
Then I realized that we don't need the final clause at all. We already have:
SB3. |
CR |
× |
LF |
SB4. |
Sep | CR | LF |
÷ |
|
So we will handle CRLF correctly anyway.
Therefore, the original SB11 (at the top here) actually works. However, it
is conceptually muddied by the last clause: we could replace SB11 by the
simpler:
SB11. |
( STerm | ATerm )
Close* Sp* |
÷ |
|
A small thing, but clearer, I think, so I'd recommend doing.
We should also add and informative note that some implementations may have
mechanisms that allow them to forbid breaking within a sequence of
characters. In such a case, an implementation could boil rules 9, 10,
and 11 down to a single rule: don't break within:
( STerm | ATerm ) Close* Sp* ( Sep |
CR | LF )? |
÷ |
|
--
Mark