Pierpaolo Bernardi <olopierpa at gmail dot com> replied to Shriramana
Sharma <samjnaa at gmail dot com>:
>> When extending beyond the BMP and the maximum range of 16-bit
>> codepoints, why was it chosen to go upto 10FFFF and not any more or
>> less?
>
> I think this is because upto 10FFFF is the maximum range supported by
> the preexisting UTF-16 encoding.
The connection to the UTF-16 addressing space is the right answer, but I
don't think the word "preexisting" is quite right. The creation of
UTF-16 and the extension of the capacity of Unicode beyond 64K went
hand-in-hand.
At the time, and for some years afterward, Unicode Technical Standards
and Reports referred to UTF-16 simply as "Unicode" and placed greater
emphasis on the expansion of "UTF-8" as "Unicode Transformation Format,"
that is to say, a transformation of "real" Unicode into some other
format. Characters outside the BMP were often described as "surrogate
pairs" or "surrogate characters," instead of using terms like
"supplementary" not tied to the dominant UTF-16.
It's true that the upper limit of ISO/IEC 10646 was capped at 10FFFF as
a consequence of the decision to limit the Unicode Standard to what
could be addressed with UTF-16.
-- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell Received on Mon Nov 26 2012 - 10:03:22 CST
This archive was generated by hypermail 2.2.0 : Mon Nov 26 2012 - 10:03:25 CST