Re: Communicator Unicode

From: Francois Yergeau (yergeau@alis.com)
Date: Tue Sep 16 1997 - 09:52:02 EDT


À 01:31 16/09/97 -0700, KNAPPEN@MZDMZA.ZDV.UNI-MAINZ.DE a écrit :
>Does the IANA currently discriminate between _character set_ and _content
>transfer encoding_ in a way similar as MIME does?

The IANA is only the registry for various MIME tags (content-types,
charsets and transfer encodings). The distinctions do come straight from
MIME.

>IN my feeling, there is
>only one underlying charter set (Unicode/ISO 10646), but there are several
>content transfer encodings:
>
>32bit or UCS-4
>16bit or UCS-2 (including UTF-16)
>UTF-8
>UTF-7

These are not content transfer encodings (CTEs) as defined in MIME, but
simply encodings of a single character *repertoire*. To be used at all, a
repertoire has to be encoded somehow; CTEs are applied to *already* encoded
data (text or not) and serve a different purpose, as per MIME (RFC 2045):

          Encoding transformations other than the identity
          transformation are usually applied to data in order to
          allow it to pass through mail transport mechanisms
          which may have data or character set limitations.

One requirement for CTEs is that they be universal. RFC 2048 says:

   All transfer encodings must be applicable to an arbitrary sequence of
   octets of any length. Dependence on particular input forms is not
   allowed.

Clearly none of the above have this property, they apply only to text.

By contrast, a MIME charset tag is required to give enough information to
map from a sequence of bytes to a sequence of characters. Hence it must
include a specification of the encoding of the character repertoire.

>It will be very helpful to have different tags for the character set itself
>and the transformation format. Otherwise you end up in a product
>(ISO-10646 version) * (transformation-format) of different character sets
>to be registered -- clearly an uneconomical approach.

For reasons given in draft-yergeau-utf8-rev-01.txt, one does not want the
version info in general, so the MIME charset tag basically reduces to a
specification of the encoding (not CTE) of 10646.

-- 
François Yergeau <yergeau@alis.com>
Alis Technologies inc., Montréal
Tél : +1 (514) 747-2547
Fax : +1 (514) 747-2561



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT