Re: UTF-8 syntax

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 08 2001 - 21:21:09 EDT

Next message: Jianping Yang: "Re: UTF-8 syntax"
Previous message: Lars Marius Garshol: "Re: UTF-8 syntax"
Maybe in reply to: DougEwell2@cs.com: "Re: UTF-8 syntax"
Next in thread: Jianping Yang: "Re: UTF-8 syntax"
Reply: Jianping Yang: "Re: UTF-8 syntax"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Jianping said:

> The issue comes from unpaired surrogates as <ED A0 80> and <ED B0 80>

These are not *unpaired* surrogates -- they are *paired* surrogates.
Else your equating them to <F0 90 80 80> or U-00010000 would make no sense.

> can be
> in UTF-8

They cannot be in well-formed UTF-8. They can only be in ill-formed
UTF-8 of the irregular subtype.

> and your search for <F0 90 80 80> (which is Unicode scalar value
> U-00010000) cannot find it. But however, when the UTF-8 string converted into
> UTF-16, <ED A0 80> and <ED B0 80> will become
> <D800 DC00>, and you can find the same character by searching <D800 DC00> in
> UTF-16.
>
> Unless this unpaired surrogate will be totally eliminated from UTF forms, this
> issue could be hit.

*PAIRED* surrogates.

--Ken

Next message: Jianping Yang: "Re: UTF-8 syntax"
Previous message: Lars Marius Garshol: "Re: UTF-8 syntax"
Maybe in reply to: DougEwell2@cs.com: "Re: UTF-8 syntax"
Next in thread: Jianping Yang: "Re: UTF-8 syntax"
Reply: Jianping Yang: "Re: UTF-8 syntax"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT