Re: UTF8 vs AL32UTF8

From: Jianping Yang (Jianping.Yang@oracle.com)
Date: Tue Jun 12 2001 - 15:05:38 EDT


Peter_Constable@sil.org wrote:

> On 06/12/2001 01:13:48 PM Jianping Yang wrote:
>
> >If you convert < ED A0 80 ED B0 80 > into UTF-16, what does it mean then?
> I
> >think definitely it means U-00010000.
>
> I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*.

So UTF-8 is not compatible with UTF-16 even in its repository, which is not
guaranteed that you will have a round-trip conversion, which may be a *big*
issue.

>
> UTF-8 has no 6-byte sequences. It must be something else, like the thing
> informally designated in our discussions as UTF-8S.

UTF-8S proposal will keep round-trip conversion between UTF-16 and UTF-8S.
Please don't confuse UTF-8S with UTF-8 as they are different encoding forms
based on the proposal.

Regards,
Jianping.

>
>
> - Peter
>
> ---------------------------------------------------------------------------
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
> E-mail: <peter_constable@sil.org>





This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT