Re: Unicode String Models from Hans Åberg via Unicode on 2018-09-11 (Unicode Mail List Archive)

From: Hans Åberg via Unicode <unicode_at_unicode.org>
Date: Tue, 11 Sep 2018 20:14:30 +0200

> On 11 Sep 2018, at 19:21, Eli Zaretskii <eliz_at_gnu.org> wrote:
>
>> From: Hans Åberg <haberg-1_at_telia.com>
>> Date: Tue, 11 Sep 2018 19:13:28 +0200
>> Cc: Henri Sivonen <hsivonen_at_hsivonen.fi>,
>> unicode_at_unicode.org
>>
>>> In Emacs, each raw byte belonging
>>> to a byte sequence which is invalid under UTF-8 is represented as a
>>> special multibyte sequence. IOW, Emacs's internal representation
>>> extends UTF-8 with multibyte sequences it uses to represent raw bytes.
>>> This allows mixing stray bytes and valid text in the same buffer,
>>> without risking lossy conversions (such as those one gets under model
>>> 2 above).
>>
>> Can you give a reference detailing this format?
>
> There's no formal description as English text, if that's what you
> meant. The comments, macros and functions in the files
> src/character.[ch] in the Emacs source tree tell most of that story,
> albeit indirectly, and some additional info can be found in the
> section "Text Representation" of the Emacs Lisp Reference manual.

OK. If one encounters a file with mixed encodings, it is good to be able to view its contents and then convert it, as I see one can do in Emacs.
Received on Tue Sep 11 2018 - 13:14:51 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 11 2018 - 13:14:51 CDT