From: Mark Davis (mark.davis@icu-project.org)
Date: Wed Feb 01 2006 - 10:11:57 CST
No, that's not sufficient; there are some edge cases. In ICU we
preprocess and store a number of pieces of data that are very useful in
optimizing normalization, such as:
a) those characters that can't combine or reorder with anything in front
of them
b) those characters that can't combine or reorder with anything behind them
c) if a character were to be decomposed, what would the first ccc be,
and what would the last
and so on.
If you run into a maybe character, then you can use the above
information plus other UCD properties to find the minimal span that you
need to worry about. (A completely stable character under NFC will be
both (a) and (b), but you can do a somewhat better job if you have both
pieces of information.)
Mark
Tim Greenwood wrote:
>Annex 8 of UAX #15 (Normalization Forms) describes the quick lookup
>property of Yes/No/Maybe for determining if a string is NFC. When I
>get a 'Maybe' is it sufficient to do the fuller analysis from the
>previous 'Yes' character? In other words (I think) is the previous
>'yes' character a stable NFC code point? From the annex it seems to be
>not, but I cannot think of an example.
>
>Can anyone provide an example where I would get a stream of 'Yes'
>followed by a 'Maybe' where the fuller analysis needs to start before
>the previous 'Yes'
>
>Thanks
>Tim
>
>
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Wed Feb 01 2006 - 10:17:48 CST