Re: UTF8 vs AL32UTF8

From: Peter_Constable@sil.org
Date: Tue Jun 12 2001 - 03:06:32 EDT


On 06/11/2001 10:45:46 PM Mark Davis wrote:

[earlier]
> - Oracle could probably make a case for their name for UTF8 simply being
>an
> anachronism. After all, the original definition of UTF-8 did convert
> surrogate pairs as they are doing in what they call UTF8.

[now]
>UTF-8 was defined before UTF-16. At the time it was first defined, there
>were no surrogates, so there was no special handling of the D800..DFFF
code
>points.

The critical thing, though, is that in UTF-8 as originally designed, there
was no question about the meaning of < ED A0 80 ED B0 80 >, of < F0 90 80
80>, and whether either could mean U-00010000. They definitely did not mean
the same thing, and the former definitely did not mean U-00010000. So
Oracle would fail utterly if being judged on that basis.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT