Re: ASCII as a subset of Unicode (was: Re: Oxford proposes a leaner alphabet)

From: Hans Aberg (haberg@math.su.se)
Date: Sat Apr 11 2009 - 15:30:11 CDT

  • Next message: Richard Ishida: "RE: Oxford proposes a leaner alphabet"

    On 11 Apr 2009, at 21:26, Doug Ewell wrote:

    >> I thought ASCII defined its characters as bytes, whereas Unicode
    >> uses code-points which when mapped using UTF-8 will contain the
    >> ASCII as a subset.
    >
    > The *set of characters* in ASCII is a proper and intact subset of
    > Unicode. How these characters are represented inside computer
    > storage and transmission protocols may be defined differently, and
    > doesn't affect my argument that "ASCII characters" and "Unicode
    > characters" are not disjoint sets.

    > Actually, I was under the impression that ASCII was defined in terms
    > of 7-bit code units, whereas there are virtually no computers or
    > users today who think in terms of 7-bit code units.

    Most likely, as in the past, it was common to treat the 8th bit as a
    check bit - it could altered as one pleased in transmission, depending
    on how one set it. This lead to MIME.

    But I think because of this tie to 7-bit bytes, the formally correct
    description is that the there is a defined canonical injection from
    the ASCI character set into the Unicode character set. It is then
    common to identify it with the image.

       Hans



    This archive was generated by hypermail 2.1.5 : Sat Apr 11 2009 - 15:34:20 CDT