Re: 32'nd bit & UTF-8

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Jan 19 2005 - 01:37:53 CST

  • Next message: Raymond Mercier: "Re: Coptic II"

    Hans Aberg <haberg at math dot su dot se> wrote:

    >>>> The old RFC you're refering to is not designating UTF-8, but
    >>>> UTF-BSS, which is a transformation format,
    >>>
    >>> OK. Fine, so we have a name for it.
    >>
    >> I was not sure about the name of it when writing the message.
    >
    > According to <http://www.cl.cam.ac.uk/~mgk25/unicode.html>, UTF is
    > short for UCS Transformation Format, where UCS stands for Universal
    > Character Set. When speaking about the extensions that I speak about,
    > I think they should certainly have a separate name. Perhaps UTF-8X for
    > extended, or BTF-8 for "bit (byte) transformation format".

    RFC 2044, the original (1996) Internet definition of UTF-8, defined up
    to 6-byte sequences.

    While RFC 2044 has been superseded (RFC 2279, 1998) and re-superseded
    (RFC 3629, 2003), and the 5- and 6-byte sequences have been removed, the
    point is that they were originally defined in an encoding scheme called
    "UTF-8." It is not true that they were only defined under some other
    name, such as FSS-UTF (the name used in Unicode 1.1) or "UTF-BSS,"
    whatever that is.

    "BTF-8" is taken; see:

    http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML018/0830.html

    > The Unicode standard is like Big Brother in George Orwell's "1984",
    > making it possible to only speak about what is right, but not what is
    > wrong.

    My goodness.

    > Besides, even though Unicode has declared to never use more than 21
    > bits, in the track record, Unicode has reneged on such promises. It
    > might be prudent to knock down a full 32-bit encoding, declaring
    > UTF-8/32 to be subsets of that.

    I suppose the "promise" that you are referring to, on which Unicode
    "reneged," was the original 16-bit design that was extended with the use
    of surrogate pairs.

    The difference between finding 65,000 things that need to be encoded and
    finding 1.1 million things that need to be encoded is the difference
    between night and day.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 01:41:35 CST