Minor flaw in rules for locating text element boundaries

From: Timothy Partridge (timpart@perdix.demon.co.uk)
Date: Mon May 15 2000 - 15:08:02 EDT

On page 125 of Unicode 3.0, rule 4 says

No overlapping sets. [snip] A later character set definition will override a
previous one, removing its characters from the previous set.

In the Line Boundaries section a large number of sets are defined on pages
129-130. Unfortunately the last set to be defined is
All All Unicode characters

Surely by strict interpretation of rule 4 this sucks all the characters out
of the previous sets? I know what you mean, but you don't mean what you say.


P.S. This significantly increases the efficiency of implementations - line
breaks can occur before and after every character :-)

Tim Partridge. Anyopinions expressed are mine only and not those of my employer

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT