Is the number of codepoints in a UTF-16 string well defined?
For example, which of the following two statements are true?
(a) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains two codepoints, U+DC00 and U+10020.
(b) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains three codepoints, U+DC00, U+D800 and U+DC20.
Statement (a) is probably more useful, but I couldn't find anything to
rule that statement (b) is false.
Richard.
Received on Sun Oct 11 2015 - 16:21:57 CDT
This archive was generated by hypermail 2.2.0 : Sun Oct 11 2015 - 16:21:58 CDT