Re: Unicode String Models

This message: [ Message body ] [ Respond ] [ More options ]
Related messages: [ Next message ] [ Previous message ] [ In reply to ] [ Next in thread ] [ Replies ]

From: Eli Zaretskii via Unicode <unicode_at_unicode.org>
Date: Wed, 12 Sep 2018 05:34:21 +0300

> Date: Wed, 12 Sep 2018 00:13:52 +0200
> Cc: unicode_at_unicode.org
> From: Hans Åberg via Unicode <unicode_at_unicode.org>
>
> It might be useful to represent non-UTF-8 bytes as Unicode code points. One way might be to use a codepoint to indicate high bit set followed by the byte value with its high bit set to 0, that is, truncated into the ASCII range. For example, U+0080 looks like it is not in use, though I could not verify this.

You must use a codepoint that is not defined by Unicode, and never
will. That is what Emacs does: it extends the Unicode codepoint space
beyond 0x10FFFF.
Received on Tue Sep 11 2018 - 21:34:39 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 11 2018 - 21:34:39 CDT