Re: Proposing UTF-21/24

From: Frank Ellermann (nobody@xyzzy.claranet.de)
Date: Sat Jan 20 2007 - 19:32:59 CST

Next message: David Starner: "Re: Proposing UTF-21/24"

Previous message: Ruszlan Gaszanov: "Proposing UTF-21/24"
In reply to: Ruszlan Gaszanov: "Proposing UTF-21/24"
Next in thread: Ruszlan Gaszanov: "RE: Proposing UTF-21/24"
Reply: Ruszlan Gaszanov: "RE: Proposing UTF-21/24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ruszlan Gaszanov wrote:

> Any comments?

Some of your arguments like "won't need a BOM anymore" don't make
sense for me, but the UTF-24A idea is nice. Even if I'm lost in
a sequence of UTF-24A octets I can always find the start or end
of a UTF-24A code point: 1P0 can be 100 or 110, therefore bytes
with MSB 1 are the start unless the previous byte also has MSB 1,
and then the previous byte is the start. Similarly LSB 0 could
be used to determine the end.

One disadvantage of your scheme, unlike UTF-8 it can't be directly
expressed in CharmapML, the parity bit destroys simple patterns,
and an enumeration of 2**21 (minus surrogates) code points won't
fly. But BOCU-1 has the same issue, that's no showstopper.

Maybe you could use a trick, instead of 1P0 use 100 and 110 for
UTF-24E (even) and UTF-24O (odd) CharmapML descriptions, and a
comment that one half of the real UTF-24 corresponds to UTF-24E,
and the other half to UTF-24O.

Compare <http://purl.net/xyzzy/home/test/utf-8.xml> for one of my
two CharmapML experiments.

Frank

Next message: David Starner: "Re: Proposing UTF-21/24"
Previous message: Ruszlan Gaszanov: "Proposing UTF-21/24"
In reply to: Ruszlan Gaszanov: "Proposing UTF-21/24"
Next in thread: Ruszlan Gaszanov: "RE: Proposing UTF-21/24"
Reply: Ruszlan Gaszanov: "RE: Proposing UTF-21/24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Jan 20 2007 - 19:47:45 CST