Re: Provenance of Unicode

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 02 2001 - 19:55:33 EDT


Dean Snyder asked:

> on 4/30/01 3:46 PM, Kenneth Whistler at kenw@sybase.com wrote:
>
> > the unicode@unicode.org discussion list does not develop protocols
>
> I'm embarassed to ask this, but that's never stopped me before. ;-)
>
> What do you mean by "protocols" when you say that this list is not involved
> in developing them?
>
> Do you not, for example, consider the canonical ordering of combining marks
> and the bi-directional algorithm protocols? [Actually, I've always thought
> that one of the great benefits of Unicode was the value added by the various
> "protocols" (or "properties" and "rules" in Unicode parlance) it develops on
> top of ISO 10646.]

The canonical ordering algorithm, the bidirectional algorithm, the
Unicode collation algorithm, and the like, are *algorithms*, rather
than *protocols*.

Yes, much of the additional value of the Unicode Standard is in this
kind of material, but there are differences between these things -- they
are not all, willy-nilly, protocols.

"Properties" are attributes of characters. Those attributes then interact
with algorithms to help define textual behavior.

"Rules", in the context of the Unicode Standard, are generally defined as
pieces of algorithms.

"Algorithms", in the context of the Unicode Standard, are usually ordered
sets of rules operating on characters (and their properties), to produce
desired behavior of text in reliable, reproducible ways in implementations.
Thus *normalization* is defined as a detailed algorithm by the Unicode
Technical Committee, so that implementation A which claims to be normalizing
Unicode text and implementation B which also claims to be normalizing
Unicode text can be expected to have comparable results. Furthermore, there
is then a well-defined place where an umpire can go to determine who is
right if implementations disagree in their handling.

"Protocols" usually have to do with the specification of agreed-upon rules
of behavior in an interacting context, so that when A delivers something
to B that purports to follow Protocol X, B knows exactly what format to
expect, what order things will be delivered in, and how to respond, if
necessary. An obvious and highly visible example of a protocol is smtp,
which defines what the "From", "Date:", "Subject", "To:" and related fields
in email mean, how they are formatted and delivered, what kinds of
errors should be handled, where errors are delivered, and so on and
so on, in order for email to work.

While protocols may often incorporate algorithms (rather than just formatting
specifications or call-response chains, and so on), and while some textual
algorithms in the Unicode Standard have protocol-like elements, the UTC
has, to date, considered character properties and textual algorithms to be
within its purview, and has generally considered protocols for text handling
(what are also termed "higher-level protocols" in the book) as outside its
purview.

One of the obvious gray areas was the language tags in Plane 14, which
contain both a character encoding side and a small protocol for their usage.
William Overington obviously spotted this in using the Plane 14 language
tags as a model for his PUA support codes proposal. However, the Plane 14
language tags are a very special case, and not something that the UTC
has shown any delight in or any intention of emulating in the future.
As I stated before, they were born deprecated.

>
> So, where is the line drawn regarding what this list (and by extension, the
> UTC?) will not involve itself when it comes to building on top of simple
> tables of code points?

As Rick pointed out, this is an open list. It will involve itself in
whatever its participants find interesting about the Unicode Standard.

Note, however, that what happens on this list cannot just be extrapolated
into the UTC. Not everything discussed on this list gets dealt with
by the UTC. The UTC works by agendas and decides issues based on its
formal procedures. See:

http://www.unicode.org/unicode/consortium/utc.html

and

http://www.unicode.org/unicode/consortium/utc-procedures.html

An open discussion list like the unicode@unicode.org list tends
to do the following kinds of things well:

1. Convey information about the Unicode Standard
2. Answer specific questions about the standard or
    implementation difficulties people are having on
    one platform or another
3. Discuss defects and problems in the standard (although
    there is a specific formal address for defect
    reports: errata@unicode.org)
4. Float new ideas
5. Provide feedback for new (or old) ideas
6. Educate newcomers about the standard
7. Self-police its own rules for proper behavior on the list

What an open discussion list cannot effectively do includes
the following:

1. Standardize anything
2. Create documents (or protocols or algorithms expressed in documents)
3. Come to closure on controversial issues

What I have been trying to convey in some of my responses on the
Overington PUA thread is *not* that it is inappropriate to discuss
protocols on the unicode@unicode.org list -- obviously we have been
doing so for days now, at some length.

Instead, I have been trying to point out that it is hopeless to actually
try to *develop* a protocol on this list. There is a profound difference
between the way *this* open discussion list works and the way
the a typical IETF working group discussion list works. In the latter
case, there is a defined timeframe, a moderator, a promised deliverable
that usually consists of one or more Internet Drafts that is iteratively
developed by an author or authors in response to the strictly focussed
discussion on that list. OT maundering on an IETF working group discussion
list is strictly verboten, and can get you kicked off the list. And
when the deliverable (an Internet Draft or an RFC for a protocol, etc.)
is finished, the working group and its discussion list dissolves.

Capiche?

--Ken



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:16 EDT