Re: Comments on <draft-ietf-acap-mlsf-00.txt>?

From: Unicode Discussion (unicode@justsystem.co.jp)
Date: Tue Jun 03 1997 - 16:37:22 EDT


> Any comments on
> ftp://ds.internic.net/internet-drafts/draft-ietf-acap-mlsf-00.txt
> ?

> Language tags are encoded by mapping them to upper-case, then
> adding hexidecimal A0 to each octet. The result is broken up into
> groups of five octets followed by a final group of five or fewer
> octets. Each group is prefixed by a UTF-8-style length count with
> the low bits set to 0.

If I have not misunderstood UTF-8 or "MLSF" completely:

A.
  1. A UTF-8-style length count with the low bits set to 0 is
     **not** an "illegal" UTF-8 "start character code" octet.

  2. Adding hexadecimal A0 to the "ASCII" codes for A-Z produces
     something that is an "illegal" UTF-8 continuation octet, but
     *is* a legal "start character code" octet (111xxxxx, where
     each x may be 1 or 0 independently of the others, with some
     exclusions).

   I think this would confuse most UTF-8 decoders, and is unlikely
   to be silently ignored.

B. This trick is designed for UTF-8 only, and does *not* work for
   Unicode/ISO/IEC10646 in general, which means it **cannot** be
   transformed into UTF-16 (nor UCS-4), without using some
   *other* way of representing the language tags.

C. "Higher level protocols" (e.g. MS-doc/RTF, HTML, etc., etc.)
   seems to be a more suitable place for handling language tags
   (and is where they are handled now).

IMHO, MLSF should thus **not** be used.

                        /kent karlsson
-----------
Any opinions expressed are my personal ones, etc., etc., ...



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT