RE: Subject: Re: 32'nd bit & UTF-8

From: Martin Duerst ([email protected])
Date: Mon Jan 24 2005 - 02:07:54 CST

Next message: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"

Previous message: Martin Duerst: "Re: 32'nd bit & UTF-8"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 13:24 05/01/20, Peter Constable wrote:

>If anyone ever assumed UTF-8 is compatible with ASCII, they were
>mistaken. An ASCII processor can expect to receive octets strictly in
>the range 0 - 127, period, whereas clearly UTF-8 data can contain octets
>outside that range.
>
>ASCII is forward compatible with UTF-8 (a UTF-8 processor can process
>ASCII data), not the other way around.

Well, there is more than just that. There is a large class
of programs and tools out there that process 8-bit data, but
look only at the ASCII values. Such tools work with a lot
of encodings, starting with iso-8859-1, but not with some
others such as Shift_JIS. The subset of UTF-8 without a BOM
works with such tools, but with a BOM, it doesn't.

Another point, as already mentioned, is that encoding US-ASCII
as UTF-8 is still US-ASCII if there is no BOM, but no longer
US-ASCII if there is a BOM.

Regards, Martin.

Next message: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"
Previous message: Martin Duerst: "Re: 32'nd bit & UTF-8"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 19:27:30 CST