Why are the low surrogates numerically larger than the high surrogates? from Costello, Roger L. on 2013-01-23 (Unicode Mail List Archive)

From: Costello, Roger L. <costello_at_mitre.org>
Date: Wed, 23 Jan 2013 17:45:38 +0000

Hi Folks,

The book Unicode Demystified says this (page 190, first paragraph):

    The surrogate range is divided in half.
    The range from U+D800 to U+DBFF contains
    the "high surrogates," and the range from
    U+DC00 to U+DFF contains the "low surrogates."

Why are the low surrogates numerically larger than the high surrogates?

That is, why isn't U+D800 to U+DBFF called the low surrogates and U+DC00 to U+DFF called the high surrogates?

In the Unicode Technical Report #36, Unicode Security Considerations [1] it says:

    PEP 383 takes this approach. It enables lossless
    conversion to Unicode by converting all "unmappable"
    sequences to a sequence of one or more isolated
    high surrogate code points. That is, each unmappable
    byte's value is a code point whose value is 0xDC00
    plus byte value.

Notice "high surrogate" in that quote. I'm confused. I thought the low surrogate range started at 0xDC00, but this document is saying that 0xDC00 + byte value = high surrogate. Is that a typo in the document?

/Roger

[1] http://www.unicode.org/reports/tr36/#TOC-PEP-383-Approach
Received on Wed Jan 23 2013 - 11:53:49 CST

This archive was generated by hypermail 2.2.0 : Wed Jan 23 2013 - 11:53:56 CST