Re: Least used parts of BMP.

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Fri Jun 04 2010 - 08:35:47 CDT

Next message: Otto Stolz: "Re: Hexadecimal digits"

Previous message: Luke-Jr: "Re: Hexadecimal digits"
In reply to: Kannan Goundan: "Re: Least used parts of BMP."
Next in thread: Mark Davis ☕: "Re: Least used parts of BMP."
Reply: Mark Davis ☕: "Re: Least used parts of BMP."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,

Am 2010-06-03 07:07, schrieb Kannan Goundan:
> This is currently what I do (I was referring to this as the "compact
> UTF-8-like encoding"). The one difference is that I put all the
> marker bits in the first byte (instead of in the high bit of every
> byte):
> 0xxxxxxx
> 10xxxxxx xyyyyyyy
> 110xxxxx xxyyyyyy yzzzzzzz

The problem with this encoding is that the trailing bytes
are not clearly marked: they may start with any of
'0', '10', or '110'; only '111' would mark a byte
unambiguously as a trailing one.

In contrast, in UTF-8 every single byte carries a marker
that unambiguously marks it as either a single ASCII byte,
a starting, or a continuation byte; hence you have not to
go back to the beginning of the whole data stream to recognize,
and decode, a group of bytes.

Best wishes,
Otto Stolz

Next message: Otto Stolz: "Re: Hexadecimal digits"
Previous message: Luke-Jr: "Re: Hexadecimal digits"
In reply to: Kannan Goundan: "Re: Least used parts of BMP."
Next in thread: Mark Davis ☕: "Re: Least used parts of BMP."
Reply: Mark Davis ☕: "Re: Least used parts of BMP."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jun 04 2010 - 08:37:10 CDT