Hello,
am 2012-12-15 schrieb Philippe Verdy:
> But there's still a bug (or request for enhancement) for your Pocket
> converters :
>
> - For UTF-16 you correctly exclude the range U+D800..U+DFFF (surrogates)
> from the sets of convertible codepoints.
>
> - But you don't exclude this range in the case of your UTF-8 and UTF-32
> "magic encoders" which could forget this case. Of course your encoder would
> create distinct sequences for these code points, but they are not valid
> UTF-8 or valid UTF-32 encodings.
Only the UTF-16 variant is really *my* “magic pocket encoder” (MPE);
the author is nominated on every one of the three.
I would not demand more from those MPEs than converting
a valid UCS character to a valid, and equivalen, UTF
sequence – and to illustrate the underlying algorithm.
I guess, originally, they were meant as jokes – partially,
at least; I have used them as a didactic device, in my
beginner's lecture in Unicode.
Clearly, Mike Ayers made the point that the UTF-32 encoding
is nothing but a simple shortcut (in the terms of its two
predecessors). His one-row-only MPE expresses this quite
aptly, and any additional branch would spoil the impression.
The reason I excluded the surrogates from my UTF-8 MPE
was really that I needed additional space for the user’s
guide on the reverse side.
Cheers,
Otto Stolz
Received on Sun Dec 16 2012 - 06:19:14 CST
This archive was generated by hypermail 2.2.0 : Sun Dec 16 2012 - 06:19:16 CST