RE: UTF-8 syntax

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Fri Jun 08 2001 - 15:06:29 EDT


> From: Jianping Yang [mailto:Jianping.Yang@oracle.com]

> This will fix the following problem for example:
> For a searching engine to search the character U-00010000 in
> UTF-8 string, and it
> could not find. But when UTF-8 is converted into UTF-16, it
> can found it there
> because <ED A0 80> and <ED B0 80> are converted into
> U-0001000 in UTF-16.

        (scratches head)

        HUH?

        To find U-00010000 in UTF-8, just search for <F0 90 80 80>[1] and
find it. If you convert to UTF-16, you will need to search for something
else[2], which will not be <00010000>[4], which is the UTF-32
representation. So I fail to see how anything gets "fixed" here.

        I am getting more convinced as this goes along that there is not a
single technical reason for UTF-8s.

/|/|ike

[1] - Byte conversion courtesy of Cima's UTF-8 Magic Pocket Encoder[3].

[2] - I can't convert UTF-16 ... Marco? Please? How about a UTF-16 Magic
Pocket Encoder?

[3] - Which is NOT used to encode magic pockets.

[4] - Magic Pocket Encoder not necessary for this one.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT