From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Aug 29 2009 - 04:49:12 CDT
On 8/28/2009 11:26 PM, William_J_G Overington wrote:
> It is now clear to me that hypersurrogates as I was thinking of them yesterday could not be encoded in a future version of Unicode.
>
> I have thought further on the problem and have thought that, even though use of codes above U+10FFFF is not possible, there is an alternative mechanism that could solve the encoding problem without breaking "interoperability between UTF-16, UTF-8, and UTF-32 forms of the encoding".
>
> Suppose that such items as logos and personal Gaiji could each be encoded as a sequence of two Unicode codepoints, one from plane 10 followed by one from plane 11 and that such a sequence would not imply any other codepoint, it would just be an ordered sequence of two codepoints, so that the character would be encoded at a point within a two-dimensional space.
In principle, you can reference (let's leave the word "encode" out of
this for a while) anything by labeling it with a string. HTML uses
entities, such as " where the string is delimited by two characters
"&" and ";", many bulletin board and email implmentations will interpret
strings like :) as glyphs (you many not see the ":" followed by ")"
here. These strings have no standard escape characters. Some support
more regular string formats as well, as in ":shocked:". So that idea is
not new.
Common to all of these approaches is that they are *higher* level
protocols. There's always a more basic level where "&" is just the
character "&", and only when you claim to support HTML do you have to
turn the real "&" into "&" to distinguish it from the syntax character.
Howver, if you start reserving code points on certain planes for such a
scheme and attempt to enlist the Unicode Consortium (owners of an
encoding standard) into this process, it's impossible for people *not*
to perceive this as *encoding*. That is the reason why these kinds of
proposals will forever be non-starters in *this* particular context.
> If an even larger codespace were required,
That's the other reason: after 20 years of effort, the Consortium has
barely managed to encode as many characters as there are "private use"
code points in the standard. Your worries about code space extensions
are premature. Really.
A./
This archive was generated by hypermail 2.1.5 : Sat Aug 29 2009 - 04:52:30 CDT