From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 06:51:11 CST
At 20:24 -0800 2005/01/19, Peter Constable wrote:
>> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
>On
>> Behalf Of Peter Kirk
>
>> This is a very significant point. Because a BOM may be used with
>UTF-8,
>> UTF-8 is in fact not quite as compatible with ASCII as has been
>> presumed.
>
>If anyone ever assumed UTF-8 is compatible with ASCII, they were
>mistaken. An ASCII processor can expect to receive octets strictly in
>the range 0 - 127, period, whereas clearly UTF-8 data can contain octets
>outside that range.
This has been a problem in the past, that ASCII computers and computer
programs strictly speaking only processes 7 bits, reserving the 8'th bit for
various uses (such as parity, etc.) Examples are programs like TeX and the
UNIX OS's.
But because of the need of various ISO-Latin and ISO 8 bit encodings, this
has changed. So these programs now all process not pure ASCII, byte 8-bit
bytes where ASCII often is reserved for lowest 7 bits. (For example, MIME
was originally invented, in order to enable 8 bit transfer of email, as it
proved notoriously difficult to get the Internet email forwarding software
updated to properly handle 8-bit bytes.)
>ASCII is forward compatible with UTF-8 (a UTF-8 processor can process
>ASCII data), not the other way around.
So it is not pure ASCII we are speaking about, but programs that are already
capable of handling 8-bit bytes, assuming that ASCII are those with leading
bit 0. Then, without the requirement that the BOM should be ignored, there
is often no or little changes needed to the software.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 06:52:49 CST