From: Hans Aberg (haberg@math.su.se)
Date: Sat Apr 11 2009 - 15:30:11 CDT
On 11 Apr 2009, at 21:26, Doug Ewell wrote:
>> I thought ASCII defined its characters as bytes, whereas Unicode
>> uses code-points which when mapped using UTF-8 will contain the
>> ASCII as a subset.
>
> The *set of characters* in ASCII is a proper and intact subset of
> Unicode. How these characters are represented inside computer
> storage and transmission protocols may be defined differently, and
> doesn't affect my argument that "ASCII characters" and "Unicode
> characters" are not disjoint sets.
> Actually, I was under the impression that ASCII was defined in terms
> of 7-bit code units, whereas there are virtually no computers or
> users today who think in terms of 7-bit code units.
Most likely, as in the past, it was common to treat the 8th bit as a
check bit - it could altered as one pleased in transmission, depending
on how one set it. This lead to MIME.
But I think because of this tie to 7-bit bytes, the formally correct
description is that the there is a defined canonical injection from
the ASCI character set into the Unicode character set. It is then
common to identify it with the image.
Hans
This archive was generated by hypermail 2.1.5 : Sat Apr 11 2009 - 15:34:20 CDT