From: Martin Duerst (duerst@w3.org)
Date: Mon Jan 24 2005 - 02:07:54 CST
At 13:24 05/01/20, Peter Constable wrote:
>If anyone ever assumed UTF-8 is compatible with ASCII, they were
>mistaken. An ASCII processor can expect to receive octets strictly in
>the range 0 - 127, period, whereas clearly UTF-8 data can contain octets
>outside that range.
>
>ASCII is forward compatible with UTF-8 (a UTF-8 processor can process
>ASCII data), not the other way around.
Well, there is more than just that. There is a large class
of programs and tools out there that process 8-bit data, but
look only at the ASCII values. Such tools work with a lot
of encodings, starting with iso-8859-1, but not with some
others such as Shift_JIS. The subset of UTF-8 without a BOM
works with such tools, but with a BOM, it doesn't.
Another point, as already mentioned, is that encoding US-ASCII
as UTF-8 is still US-ASCII if there is no BOM, but no longer
US-ASCII if there is a BOM.
Regards, Martin.
This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 19:27:30 CST