From: vunzndi@vfemail.net
Date: Fri Jan 09 2009 - 04:18:17 CST
Quoting "Michael D'Errico" <mike-list@pobox.com>:
>> Your suggestion, Michael, is to modify how the Unicode standard
>> works in order to encode emoji and similar non-text content in a
>> flexible and extensible way. My suggestion is that this content
>> belongs in a different standard altogether, one that is focused on
>> non-text content.
>
> I've thought about this. But since you would want to intermix text
> and non-text, it makes sense to retain Unicode as a subset and use
> the same UTF encoding schemes. The problem, though, is that Unicode
> claims all the code points, so a new standard would have to violate
> the rules, either by using planes that Unicode will probably never
> use(*), or by going beyond plane 16 (which is impossible with UTF-16
> and specifically disallowed for UTF-8 and UTF-32 conformance).
>
> Personally, I would choose the latter approach and just say that you
> can't use UTF-16. UTF-8, even limited to 4 bytes, can encode a total
> of 32 planes, so there would be lots of initial room. Expanding it
> to 6 bytes as it was originally specified handles 32k planes.
>
> The problem with moving beyond the reach of UTF-16 is that some
> programming languages designed their String classes to hold UTF-16
> code points, and would therefore not be able to access the non-text
> content. This is probably the biggest roadblock to a solution
> outside of Unicode, and means that either Unicode would have to give
> up some of its code space to a new standard, or embrace the ideas
> and make it a part of Unicode.
>
Extending beyond plane 16 would not be that difficult - but with only
25% of the 16 planes allocated, there is no immediate danger of
filling up all 16 planes in the near future, or even in the next few
decades.
>
> Well I won't be holding my breath....
>
> Mike
>
> *Whistler's Conjecture states that no characters will ever be encoded
> beyond plane 2.
>
plane 3 is now road mapped for Ideographs, and named Tertiary
Ideographic Plane, TIP, and will certainly have characters in it.
John Knightley
>
>
This archive was generated by hypermail 2.1.5 : Fri Jan 09 2009 - 07:34:33 CST