Re: 8-bit text which is supposed to be UTF-8 but isn't

From: John Cowan (jcowan@reutershealth.com)
Date: Mon Jan 31 2000 - 10:09:40 EST


Dan Oscarsson wrote:
 
> Yes, UTF-16 was done right. Unfortunately UTF-8 was done wrongly. UTF-8
> should just like UTF-16 is compatible with code in the 16-bit space,
> been compatible with the first characters of 8 bits.

How? An 8-bit code compatible with UTF-16 in its first 8 bits has
no space left to represent the other 109744 codepoints. Unlike the
16-bit codespace from 0 to FFFF, the 8-bit codespace from 0 to FF is
densely packed with characters.

-- 

Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT