Re: Comments on <draft-ietf-acap-mlsf-00.txt>?

From: Eric Brunner (teb@opengroup.org)
Date: Wed Jun 04 1997 - 14:40:03 EDT


Oki Pete,

Forgive me for comming to the rescue of Rick McGown ... He has done so
much to get Native American writing systems ... written. This is a bit
long.

Rick asked "What is the purpose of this (draft-ietf-acap-mlsf-00.txt)?"

I think that it is fair to describe a mechanism as failing to meet some
need, e.g., is "inadequate", and then to state that the intended purpose
of the mechanism, which wasn't the purpose originally asserted, remains
unclear. Of course, my personal favorite for an i18n-ish mechanism that
meets the criteria for being "bizarre, inadequate, and pointless" remains
the shift-character hack, who's fundamental purpose I speculated back in
1986 as being selling processing cycles to non-ASCII speakers...

It does seem fair to ask:

        Does this mechanism (best) meet the purpose of "a general access
        mechanism for ... structured lists ... application configuration
        options and addressbooks"?

He also asked "What is the intended audience?"

You've provided one answer, I suspect that the intended audience is Chris
Newman, as author and as chair of acap, and Keith Moore and Harald Alvestrand,
as Applications Area Directors, and I commend anyone who actually cares to
point their bowser of choice to:
http://www.ietf.org/html.charters/acap-charter.html

I've added Chris, Keith and Harald to the cc's. I hope they don't mind
getting unicode-list traffic on this I-D.

> ACAP uses UTF-8 encoded strings to transmit certain kinds of data.

You are correct of course that Section 8, "Formal Syntax", in the usual
ABNF of [IMAIL] mentions utf8.

> It was evidently deemed desireable to language tag some of that data.

First, a quibble over voice. "It was evidently deemed desireable ..."

The text has an author, Chris Newman, who may or may not be on-point.
The determination of that is the subject of this discussion, and is not
made clear by assertion alone.

You are incorrect in associating "language" with "tag", see the following,
also from the same section:
   ...
   tag ::= 1*<any ATOM-CHAR except "+" or "*">
   ...
See: ftp://ftp.ietf.org/internet-drafts/draft-ietf-acap-spec-03.txt

In the actual draft under discussion, Multi-Lingual String Format, the
tag construct appears in usage only in:

> Appendix E. Sample code for selecting the "best" alternative

In section 7, "Formal Grammar", tagging appears as follows:

> MLSF-LANG-TAG = *MLSF-LANG-5 (MLSF-LANG-1 / MLSF-LANG-2 /
> MLSF-LANG-3 / MLSF-LANG-4 / MLSF-LANG-5)
> ;; Encoded version of Language-Tag from RFC 1766
> ;; characters converted to uppercase, with
> ;; A0 added and broken into MLSF-LANG components

In any event, the mechanism to refer to rfc1766, and hence to my least
favorite ISO standards, 639 and 3166, and is external to 10646. That is
the real point isn't it?

Rick than asked "Why mess with a perfectly standard format like UTF-8?"

Your answer appears to miss the possible assertion that a sufficient, and
perhaps necessary mechanism already exists, in 10646.

Rick finally asked "Why cannot other mechanisms be used?"

Your answer refers to "the eyes of the IETF". Personally I prefer cites
to "The Eyes of Laura Mars" (a film). I've provided a pointer to the acap
charter, the interested reader can work up to where _our_ process is
described, e.g., RFCs 2028, 2027, 2026. Appealing to authority is simply
a non-starter (see below on how recent the authority appealed to has had
an action item to become competent), particularly when the authority on
the subject is ... (sotto voice) more likely to be found on the unicode
list, than on any IETF list.

Rick mentions the adequacy of the tags, and the referent document, rfc1766.

Your response is specific to the usual CJK issue, but you can take it from
me, if no one else, that appealing to 639 and 3166 is of limited adequacy.
If the limits on the adequacy of 639 and 3166 are not apparent to you, see
either my signature (not in syllabic form, which is the point) or find the
language indigenous to the (217) area code in your ultimate referent documents.

You close somewhat chastisingly to Rick's brief note, but to my own reading
conclude incorrectly that no reasoning was provided. I grant that some form
of context was presumed on Rick's part. As you suggest to Rick, and as I'm
in the habit of doing, I've cc'd the I-D author and the relevant Area
Directors. The author may wish to suggest changes and motivation for such
to the WG mailing list -- if no changes are forthcomming after substantive
criticism on this list (unicode), then process action described in _our_
process (see above) will follow.

For other, perhaps motivational discussion of the rather novel appearance
of i18n-esque issues in I-Ds and RFCs, I commend anyone who actually cares
to point their bowser of choice to:
http://info.internet.isi.edu:80/IAB/IABmins.970408

This is the minutes of the most recent IAB, and under "Administrivia", the
following appears:

    + Chris Weider swan song on character sets: every RFC that describes
    transport of text must have a section on character set handling; if does
    not use 10646, explain why; finally, must get 10646 on line (Harald
    Alvestrand is doing this).
    
Clearly, the I-D "Application Configuration Access Protocol" describes the
presentation of text, if not the transport, hence character set handling.

Equally clearly, the "if does not (exclusively) use 10646, explain why"
test does not appear to have been met in draft-ietf-acap-mlsf-00.txt.

Finally, the notion that any WG will fail to inform itself, when responding
to an IAB directive ("must" has a specific meaning in this context), of the
best available technical information specific to that directive, seems to be
very foreign to the community I've been a part of for a while.

Aside to Rick --

In your reply to Chris you appear to fail to distinguish between an RFC and
an Internet Draft. Don't worry, it is just an I-D at this point, and easily
fixed. Then again, obsoleting RFCs can be accomplished reasonably, one just
asks and offers. The CJK red herring is something I don't have the energy or
time to fish out either -- we should have a FAQ for these urban myths.

I concur with the fix you suggest to Chris, a layered encoding free of the
temptation of "idle bits" (the devil's plaything).

Personally, I suspect that rfc2130 has an IETF readership mostly restricted
to readers of draft-ietf-stdguide-ops-04.txt "Guide for Internet standards
Writers"

Here is a quote:
3.4 Character Sets

  At one time the Internet had a geographic boundary and was English
  only. Since the Internet now extends internationally, application
  protocols must assume that the contents of any text string may be in
  a language other than English. Therefore, new or updated protocols
  which transmit text must use ISO 10646 as the default Coded Character
  Set, and RFC 2044, "UTF-8, a transformation format of Unicode and ISO
  10646" as the default Character Encoding Scheme. An exception is the
  use of US-ASCII for a protocol's controlling commands and replies.
  Protocols that have a backwards compatibility requirement should use
  the default of the existing protocol. This is in keeping with the
  recommendations of RFC 2130, "The Report of the IAB Character Set
  Workshop held 29 February - 1 March 1996."

--
Kitakitamatsinohpowaw (I'll see you again, in Siksika/Blackfeet, Romanized),

Thomas Eric Brunner email: teb@opengroup.org Principal Research Engineer http://www.opengroup.org/~teb The Open Group Research Institute http://www.opengroup.org 11 Cambridge Center Tel: (617) 621-7314 Cambridge, MA 02142 FAX: (617) 621-8696



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT