RE: Is the binaryness/textness of a data format a property? from Doug Ewell via Unicode on 2020-03-21 (Unicode Mail List Archive)

From: Doug Ewell via Unicode <unicode_at_unicode.org>
Date: Sat, 21 Mar 2020 13:33:18 -0600

Eli Zaretskii wrote:

>>> Also, UTF-8 can carry more than Unicode -- for example,
>>> U+D800..U+DFFF or U+11000..U+7FFFFFFF (or possibly even up to 2³⁶ or
>>> 2⁴²), which has its uses but is not well-formed Unicode.
>>
>> I'd be interested in your elaboration on what these uses are.
>
> Emacs uses some of that for supporting charsets that cannot be mapped
> into Unicode. GB18030 is one example of such charsets. The internal
> representation of characters in Emacs is UTF-8, so it uses 5-byte
> UTF-8 like sequences to represent such characters.

When 137,468 private-use characters aren't enough?

I thought the whole premise of GB18030 was that it was Unicode mapped into a GB2312 framework. What characters exist in GB18030 that don't exist in Unicode, and have they been proposed for Unicode yet, and why was none of the PUA space considered appropriate for that in the meantime?

--
Doug Ewell | Thornton, CO, US | ewellic.org

Received on Sat Mar 21 2020 - 14:33:42 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 21 2020 - 14:33:42 CDT