Re: Unicode String Models

From: Hans Åberg via Unicode <unicode_at_unicode.org>
Date: Wed, 12 Sep 2018 00:13:52 +0200

> On 11 Sep 2018, at 23:48, Richard Wordingham via Unicode <unicode_at_unicode.org> wrote:
>
> On Tue, 11 Sep 2018 21:10:03 +0200
> Hans Åberg via Unicode <unicode_at_unicode.org> wrote:
>
>> Indeed, before UTF-8, in the 1990s, I recall some Russians using
>> LaTeX files with sections in different Cyrillic and Latin encodings,
>> changing the editor encoding while typing.
>
> Rather like some of the old Unicode list archives, which are just
> concatenations of a month's emails, with all sorts of 8-bit encodings
> and stretches of base64.

It might be useful to represent non-UTF-8 bytes as Unicode code points. One way might be to use a codepoint to indicate high bit set followed by the byte value with its high bit set to 0, that is, truncated into the ASCII range. For example, U+0080 looks like it is not in use, though I could not verify this.
Received on Tue Sep 11 2018 - 17:14:14 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 11 2018 - 17:14:14 CDT