Re: Why 17 planes? from Doug Ewell on 2012-11-29 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Thu, 29 Nov 2012 08:47:31 -0700

William_J_G Overington <wjgo underscore 10009 at btinternet dot com>
wrote:

>> Do NOT try to make this system conceptually part of Unicode.
>
> Well, consider please the following example, from a simulation, of the
> text of a plain text email.

> 
> 
> Margaret Gattenford
> [...]

Embedding these items in "the text of a plain text email" is an attempt,
intentional or not, to make the system conceptually part of Unicode.
Don't do that.

> So, I am not suggesting that the items are characters, I am suggesting
> that they are encoded as if they are characters.
>
> I feel that if a collection of such items were encoded into Unicode/
> 10646 as if they were characters, possibly in plane 13, that that would
> be good.

But Unicode/10646 isn't a standard for encoding non-character items "as
if they were characters." It's a standard for encoding characters.

>> Do NOT imagine that creating a font with glyphs for these elements
>> makes them characters.
>
> Well, it does not make the items characters. However, they are encoded
> as if they are characters so as to be able to use them intermixed with
> characters so that they can be sent in plain text emails and could be
> used with a specially adapted email reading system so as to produce a
> localized screen display that is totally characters.

You mean like an e-mail or text messaging system that replaces <003A
0029> (colon, right parenthesis) with <263A> (smiley face)? You don't
need special code points for that. Just create a markup language with
normal characters, and have your "specially adapted email reading
system" interpret the markup sequences and convert them to normal text:

[+10+]
[+200+]
Margaret Gattenford
...

> Yet, please consider the emoji that are encoded. For amny of them, the
> items are depicted as pictures and localization takes place in the
> mind of the end user viewing the picture.

Emoji start out as pictures and remain pictures. Interpretation is out
of scope. Please consider the well-known "chat" example.

>> There are plenty of great standards out there for encoding things
>> that are NOT characters. Please feel free to add to THAT idea space.
>
> Yet the items for which I am making, at a research level and not at a
> standardization level, definitions and encodings are for use
> intermixed with Unicode/10646 plain text. So I am using the technology
> that is best for the task.

Intermixed with plain text, fine. There are existing standards, and
non-standardized experiments, that combine plain text with non-textual
content. Please consider word processor documents: they have Unicode
text together with pictures, formatting, indexing hints, locked fields,
and more. The key is that they integrate the Unicode text INTO their
format; they don't try to integrate their format into Unicode. There is
no Unicode character, private-use or otherwise, for "format this block
of text into three columns of widths 7 cm, 4 cm, and 4 cm, with a gap of
0.5 cm between each column, repeating the header at the top of each
column," and no, there should not ever be one.

Your statement that this is all "at a research level and not at a
standardization level" demonstrates clearly why none of this should be
considered for a character encoding STANDARD.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Received on Thu Nov 29 2012 - 09:49:13 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 29 2012 - 09:49:14 CST