Hi Folks,
The book Unicode Demystified says this (page 190, first paragraph):
The surrogate range is divided in half.
The range from U+D800 to U+DBFF contains
the "high surrogates," and the range from
U+DC00 to U+DFF contains the "low surrogates."
Why are the low surrogates numerically larger than the high surrogates?
That is, why isn't U+D800 to U+DBFF called the low surrogates and U+DC00 to U+DFF called the high surrogates?
In the Unicode Technical Report #36, Unicode Security Considerations [1] it says:
PEP 383 takes this approach. It enables lossless
conversion to Unicode by converting all "unmappable"
sequences to a sequence of one or more isolated
high surrogate code points. That is, each unmappable
byte's value is a code point whose value is 0xDC00
plus byte value.
Notice "high surrogate" in that quote. I'm confused. I thought the low surrogate range started at 0xDC00, but this document is saying that 0xDC00 + byte value = high surrogate. Is that a typo in the document?
/Roger
[1] http://www.unicode.org/reports/tr36/#TOC-PEP-383-Approach
Received on Wed Jan 23 2013 - 11:53:49 CST
This archive was generated by hypermail 2.2.0 : Wed Jan 23 2013 - 11:53:56 CST