In a message dated 2001-06-25 20:19:18 Pacific Daylight Time, gs234@cam.ac.uk 
writes:
>  (For instance, I
>  don't see how it would be possible to encode a sequence of unicode
>  scalar values corresponding to a low and a high surrogate; if you
>  tried to map this back then you would get a single unicode scalar
>  value outside of the BMP).  Perhaps someone on the unicode list could
>  elaborate?
This is the source of my remaining confusion about definition D29.  It 
requires UTFs to round-trip all Unicode code points, and by extension all 
sequences of code points; yet if you use UTF-16 and start with the sequence 
<D800 DC00>, you don't end up with that -- you end up with <10000>.
The way it was explained to me on this list made it sound as though UTF-16 is 
the "master" UTF that other UTFs have to accommodate.  That didn't make sense 
to me, but I've been trying to cope with it.
Proposed UTFs that are based on UTF-16 code units, and are thus subject to 
the same D29 limitations as UTF-16, really annoy me, though.
-Doug Ewell
 Fullerton, California
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:19 EDT