Re: Generic Tagging: A Modest Proposal

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jul 15 1997 - 15:17:54 EDT


John Cowan responded re the Plane 14 proposal for
tagging:

> I find it
> quite reasonable as you describe it, EXCEPT:
>
> > 3. Rather than introducing a completely generic public
> > tag scheme, with a path characters, reversed domain
> > names, and then arbitrary tags per domain, we concluded
> > that there was really only a clamor for a couple of
> > kinds of tags: language tags and character source set
> > tags. Maybe one or two others might be useful. So we
> > introduced particular tag types rather than a completely
> > open mechanism.
>
> I think this illustrates a great lack of foresight.
> As Michael Everson has said several times, 10646 should
> last for centuries: if people have only clamored for
> two types of tags so far, that is because we are
> essentially in the pre-implementation phase still.
> When Unicode/10646 is really pervasive, new types of
> meta-information will be popping out of the woodwork,
> and at that point we will have three choices:
>
> 1) retrofit a registry scheme, grandfathering
> whatever tags are de facto in use;
> 2) use the slow, slow process of amendment to
> get more "particular tag types" embedded in the
> character set;
> 3) breathe a sigh of relief, because we PLANNED AHEAD
> by making sure that an open tag-creation
> mechanism existed from Day 0.

What you may be missing here is that the tagging
proposal is deliberately limited because there already
are open-ended tagging mechanisms available: SGML,
and its kith, HTML, XML, etc. When new types of
meta-information pop out of the woodwork, SGML and
XML can easily generate new tags for them.

The Plane 14 proposal is to address a limited requirement
for in-band tagging that is "lighter-weight" than SGML
or text/extended, etc., mostly for Internet protocol development,
and mostly for language tagging. It is aimed at
removing barriers to the universal use of Unicode on
the Internet, rather than at encouraging the use of
these kinds of tags when well-defined alternatives are
available.

Your point 2 is *deliberately* built into the Plane 14
proposal, because we don't want a proliferation of
embedded character tag types, when people should be
making use of already-existing "higher-level protocols"--
following Unicode design principles.

>
> Unicode is a brilliant example of what can be done
> if we don't plan too conservatively. Having a few
> kinds of tags violates the "Zero-One-Many" design
> pattern ("Allow none of, one of, or as many as
> possible of a given thing", roughly speaking).

Generally correct, but once you allow as many as
possible of this particular thing, you start to imply
registries to keep track of them as well. We are
thinking more along the lines of Zero-One-Two//Roll Your
Own Privately, with no intention of going beyond
Two unless a very cogent argument for an embedded
tag type requiring standardization can be brought
through the standards committees.

--Ken

>
> I urge the UTC, or the members of it present on this
> mailing list, to consider this point. I will be happy
> to make this a formal submission if someone will tell
> me how to go about it.
>
> --
> John Cowan cowan@ccil.org



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT