Re: Unicode String Models from Eli Zaretskii via Unicode on 2018-09-11 (Unicode Mail List Archive)

From: Eli Zaretskii via Unicode <unicode_at_unicode.org>
Date: Tue, 11 Sep 2018 20:21:07 +0300

> From: Hans Åberg <haberg-1_at_telia.com>
> Date: Tue, 11 Sep 2018 19:13:28 +0200
> Cc: Henri Sivonen <hsivonen_at_hsivonen.fi>,
> unicode_at_unicode.org
>
> > In Emacs, each raw byte belonging
> > to a byte sequence which is invalid under UTF-8 is represented as a
> > special multibyte sequence. IOW, Emacs's internal representation
> > extends UTF-8 with multibyte sequences it uses to represent raw bytes.
> > This allows mixing stray bytes and valid text in the same buffer,
> > without risking lossy conversions (such as those one gets under model
> > 2 above).
>
> Can you give a reference detailing this format?

There's no formal description as English text, if that's what you
meant. The comments, macros and functions in the files
src/character.[ch] in the Emacs source tree tell most of that story,
albeit indirectly, and some additional info can be found in the
section "Text Representation" of the Emacs Lisp Reference manual.
Received on Tue Sep 11 2018 - 12:21:30 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 11 2018 - 12:21:30 CDT