From: Jon Hanna (jon@hackcraft.net)
Date: Wed Feb 01 2006 - 09:59:03 CST
Tim Greenwood wrote:
> Annex 8 of UAX #15 (Normalization Forms) describes the quick lookup
> property of Yes/No/Maybe for determining if a string is NFC. When I
> get a 'Maybe' is it sufficient to do the fuller analysis from the
> previous 'Yes' character? In other words (I think) is the previous
> 'yes' character a stable NFC code point? From the annex it seems to be
> not, but I cannot think of an example.
The stable NFC code-points are those which are both "Yes" for the quick
checks, and have a combining class of 0.
Remember that the quick check tests both the derived normalisation
property (yes/no/maybe) and also that the comining marks are in
canonical order. If you have a "maybe" with a combining class of 0 you
will have to search forwards upto, but not including, the next character
with a combining class of 0. If you have a "maybe" combiner following a
"yes" character with a combining class of 0 you need to check if those
two characters have a canonical composition.
In practice the gain from further optimising is going to be slight.
This archive was generated by hypermail 2.1.5 : Wed Feb 01 2006 - 10:05:01 CST