From: Mike (mike-list@pobox.com)
Date: Sun Jan 21 2007 - 17:07:15 CST
>> When I implemented collation, I needed to define code points for
>> the various contractions that can occur. To avoid clashing with
>> any private use code points, I chose to start allocating the con-
>> tractions at 0x110000. This has worked quite nicely.
>
> One problem with that solution is that it may work if you're working
> with extensions of UTF-8 or extensions of UTF-32, but just doesn't work
> with UTF-16. The other is that with the other two, especially extending
> UTF-8, you are quite likely to fall foul of defensive code guarding
> against impossible codepoints. It's a shame, for I had been about to
> suggest it.
The values 0x110000 and higher are only used internally to keep
track of contractions, and they never leak out into normal char-
acter data. If they did, they'd be converted to 0xFFFD by my
Char class anyway.
Mike
This archive was generated by hypermail 2.1.5 : Sun Jan 21 2007 - 17:06:59 CST