From: Michael D'Errico (mike-list@pobox.com)
Date: Fri Jan 09 2009 - 01:38:56 CST
> Your suggestion, Michael, is to modify how the Unicode standard works in
> order to encode emoji and similar non-text content in a flexible and
> extensible way. My suggestion is that this content belongs in a
> different standard altogether, one that is focused on non-text content.
I've thought about this. But since you would want to intermix text
and non-text, it makes sense to retain Unicode as a subset and use
the same UTF encoding schemes. The problem, though, is that Unicode
claims all the code points, so a new standard would have to violate
the rules, either by using planes that Unicode will probably never
use(*), or by going beyond plane 16 (which is impossible with UTF-16
and specifically disallowed for UTF-8 and UTF-32 conformance).
Personally, I would choose the latter approach and just say that you
can't use UTF-16. UTF-8, even limited to 4 bytes, can encode a total
of 32 planes, so there would be lots of initial room. Expanding it
to 6 bytes as it was originally specified handles 32k planes.
The problem with moving beyond the reach of UTF-16 is that some
programming languages designed their String classes to hold UTF-16
code points, and would therefore not be able to access the non-text
content. This is probably the biggest roadblock to a solution
outside of Unicode, and means that either Unicode would have to give
up some of its code space to a new standard, or embrace the ideas
and make it a part of Unicode.
Well I won't be holding my breath....
Mike
*Whistler's Conjecture states that no characters will ever be encoded
beyond plane 2.
This archive was generated by hypermail 2.1.5 : Fri Jan 09 2009 - 01:42:26 CST