Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Sat Apr 28 2001 - 10:50:03 EDT

Next message: Thomas Chan: "IDS question"
Previous message: William Overington: "Re: Tags and the Private Use Area"
Next in thread: Wm Seán Glen: "Re: Provenance of the Unicode Standard and of statements"
Reply: Wm Seán Glen: "Re: Provenance of the Unicode Standard and of statements"
Reply: James Kass: "Re: Tags and the Private Use Area"
Maybe reply: Kenneth Whistler: "Re: Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kenneth Whistler, wrote:

And there have been a couple of no-doubt frustrating responses already.

end quote

No, not frustrating at all. I have found it fascinating. I am seeking to
participate in world class leading edge research work and the number of
contributions to this thread, the variety of opinion, the matters raised and
the potential to learn from the pointers given has been pleasing,
fascinating and very helpful.

Ken continues:

I would like to uplevel briefly here and suggest why the people
on this list are not engaging in the details of Mr. Overington's
proposals so much as questioning the need for such a protocol,
arguing the premises, talking about the role of metadata, and so
on.

end quote

Well, most of the 650 recipients of this list do not participate in most
discussions. I feel that some people will only respond to a posting in a
list if they feel that they disagree or wish to make some particular
additional point. If they agree, then they might just say "fine" to
themselves and spend their time on something else rather than feel a need to
send a posting that just says, "I agree". I am not suggesting that all or
indeed most or even any of the recipients of this list agree with the
suggestion that I made in my document. Many may not even have looked at it.

When putting forward new ideas an inventor should perhaps not expect an
immediate response. I feel that I will have done well if, of the 650
recipients on this list, some have filed the suggestion that I made in the
document of 26 April 2001 under private use area and made a mental note that
my suggestion exists, just in case one day a file coded using it turns up,
and maybe made a note that there is a suggestion about the use of U+100002
and U+100020 .... U+10007F that has been sent round and that, if they
themselves are ever going to make use of the private use area for defining
characters then, at that time, they will take into consideration the
knowledge that that suggestion has been made and might be in use somewhere,
and will make their own decision as to whether to in effect tacitly agree to
it to the limited extent of avoiding *clashing codes* with it, even though
no one else outside any organization for which they work is even aware that
the decision to avoid clashing codes with my suggestion has been made so
that the organization cannot be in any way whatsoever be seen to be
endorsing my suggestion.

I am content. I have sent out my idea as it stands and many of the key
companies using unicode may possibly have made a note that the document
exists. I have placed in this posting the URL of our family webspace, so if
they want to check whether the idea is still about then they will be able to
seek to check at the website if they wish.

Ken continues later:

One thing the Unicode discussion list doesn't do is develop
protocols. That is the kind of work that instead often takes place
on temporary Working Group discussion lists in the IETF.

end quote

What please is the IETF?

Ken continues:

While Mr. Overington's initial proposals were couched in terms
of character encoding, it soon became clear to the list and to
him that we weren't talking about standardizing any characters,
but instead a proposal for particular private uses of PUA
characters -- something the UTC and WG2 cannot and will not
endorse, precisely because they *are* private use characters.

end quote

I learned about the idea of using characters within protocols within a
plain unicode text file when the discussion turned towards the matter of
tags. I am a relative newcomer to unicode and am on the learning curve.

The Unicode Consortium cannot and will not endorse a proposal for particular
private uses of PUA characters. That has not been an issue within this
thread. I knew that situation before the thread started.

However, there is something that I feel that the Unicode Consortium could
do, if it so wished, without violating that rule. I suggest that the
Unicode Consortium could, if it so chooses, encode one or more regular
unicode characters together with a protocol so that an author of a file of
unicode plain text that uses any of the codes of the private use area could,
if and only if that author chooses to so state, state in a file of plain
unicode text what meaning the author of that file places upon any private
use area characters that the author uses.

If the Unicode Consortium were to consider making such definitions, then
perhaps I might suggest, for purposes of clarifying what I mean and
providing some examples just in this discussion, there are, at the present
time, three broad possibilities.

1. Define U+E0002 and use the existing tag characters.

2. Promote my suggestion to codes U+E0102 and U+E0120 .... U+E017F.

3. Something else.

Now I fully accept that the Unicode Consortium may not wish to do anything
whatsoever about this matter either now or ever and I am not saying or even
suggesting that it should. That is a matter for the Unicode Consortium. I
am simply suggesting that the Unicode Consortium has the power to act in
this matter if it wants to act without violating its non-endorsement policy
and rule.

As things stand, I am unaware of any suggestion other than the one that I
posted that can carry out the task that my suggestion carries out. The
suggestion is as it stands. My intention is to consider any discussion that
arises, maybe improve the suggestion as a result of any such discussions and
then hopefully place it as a document on www.users.globalnet.co.uk/~ngo
which is our family webspace in England. I may well try to think of a novel
new word to designate the document so that web archives will give the site
if anyone should happen to come across the coding being used in the future
and tries to find out more about the coding. This method has worked well
for me already with the word eutotoken. If someone comes across the word
eutotoken anywhere and searches on the web, then our family webspace can
often be found fairly easily.

I intend to make the point on any such web page that the document is not
officially recognized, cannot be officially recognized and will not be
officially recognized. I intend to make the point that it is not an
absolute method of designating the meanings being used for any private use
area codes that are being used, but that it is a method that is available
for use. Unless I am informed otherwise that that would not be proper, I
hope to say that, as far as I am aware, the suggestion is not incompatible
with the provisions of the unicode standard regarding the use of private use
area characters. I hope also, for completeness, to draw attention to the
fact that there is a concept called compliance with the unicode system, but
that I am not saying that my idea is compliant.

Ken continues:

And as has become clear in Mr. Overington's latest statement of
what he is proposing, this is really a proposal for a protocol:
a specification of a method for communicating particular interpretations
of rationally segmented portions of the PUA.

end quote

Yes I agree. I like that way of phrasing it.

Ken continues:

As such, this (unicode@unicode.org) is probably the wrong forum
to be trying to discuss, modify, and gain working consensus on
such a protocol proposal. It just isn't that kind of forum.

end quote

A page within the unicode website at www.unicode.org states as follows:

Everybody is welcome to join the public e-mail list to pose questions to the
community of Unicode users.

end quote

Now as I was researching for this reply to your posting and seeking to
establish whether you are right officially or whether that is just a
personal view that you are expressing, and seeing the above sentence that I
have just quoted, I realized that that sentence needs to be interpreted in a
reasonable manner. The questions need to be broadly relevant. One cannot,
as the saying goes here in England, "drive a coach and horses" through that
definition by posing any question about anything in the universe to the
community of Unicode users. Questions need to be relevant. What is a fair
question is open to discussion. For the matter in hand, it never occurred
to me that it was unreasonable for me to ask in this list the community of
Unicode users about the matters in my suggestion document. I am now on
awareness of your comments and so I have to consider whether continuing with
the matter is reasonable.

Something that I remembered when I was musing on what would be a reasonable
question and made me smile as it made me smile when I first heard it was as
follows.

At about the time that the United Kingdom joined the arrangement that is now
called the European Union, there was a rule that came out about the selling
of vegetable seeds and fruit trees, which rule basically, and I am not an
expert in that field, so this is only for approximate guidance in this
discussion, was that only specific cultivars on certain official lists could
be sold. There was a bit of an outcry on the basis that old varieties of
potatoes, carrots and so on, grown for over a century, would now be illegal
to sell and that these varieties might be lost. However, there was a clause
in the rules that said that seeds not on the official lists could be sold
for experimental purposes and that a small charge could be made to cover
expenses. Well, a society started selling seeds of these varieties to its
members, who were essentially anyone who paid a very modest annual
subscription, "for experimental purposes" for the cost of "expenses" which
were about the same as the price at which packets of seeds were sold in the
shops. This was a topic on the television news, because of the overtones
about purported interference in our way of life from Brussels that crops up
as a topic in the United Kingdom from time to time. I remember the man
being interviewed and with a straight face he said "And the experiment is,
do these vegetables taste better?" I laughed out loud. I suspect that that
was not the kind of experiment that the drafters of the regulations had in
mind!

Ken continues:

unicode@unicode.org doesn't "work on" specific documents as a group,
with the aim of publishing them as standard protocols for general
usage. There is no program of work and no moderator whose job
it is to attempt to solicit and capture consensus and move a
document towards final form.

end quote

Yes.

Ken continues:

The mechanism that is more appropriate to that would be to
take the proposal, rework it as an Internet Draft,
solicit commentary on that document, and then try to develop
consensus *within the IETF* to progress such a document to
a standard protocol.

Of course in such a forum any proposal like this would also
face questions regarding justification and alternatives. And
those might be equally frustrating there.

end quote

I am unaware at present as to what is IETF and cannot comment on any such
organization. As mentioned previously, I have not found this thread
frustrating. I am, however, unsure of the use of the phrase "Internet
Draft". I am talking about files of plain unicode text which, though they
might be sent as standard text files as an attachment to an email, are not
being discussed in the sense of being for use on the internet, but rather as
files on a stand alone computer.

I would point out though that the unicode specification does specifically
envisage publication in the section about the private use area. The
specification states ".... or they could be published as vendor-specific
character assignments available to applications and end users."

Now, it might perhaps be, and in the light of some of the comments made on
this list in this thread I am considering this possibility, that the Unicode
Technical Committee did not envisage the use of the ".... or they could be
published as vendor-specific character assignments available to applications
and end users." as meaning that someone could start a business of an
electronic typeworks and devise characters not covered by the unicode
specification and publish them. In view of this, I wonder if the Unicode
Consortium might like to please clarify the matter for the wording of the
specification does seem to be quite clear on this matter and I am indeed
looking at the possibility of starting such a business. Clearly there will
be a lot of work anyway for such a business and I neither want nor need an
environment where what I am trying to do is regarded, for whatever reason,
as if driving a coach and horses through the unicode specification. As far
as I was aware until this week, my idea would be fully alright with the
specification and an entirely proper business to start.

I feel that there is a wider issue. The Unicode Consortium after a great
deal of work with a lot of people taking a lot of time in careful work
produces a specification and publishes that specification. I feel that a
person who is not involved in that preparation work who then uses that
published specification should be able to use that specification in full
confidence that the specification states the situation that exists, without
being constructively told that the definitions mean something else
diametrically opposite on a nudge nudge wink wink basis and that that other
meaning is what everybody means and so on.

Ken continues:

But anyone who comes to the unicode@unicode.org list looking
to actually develop and establish a standard protocol involving
Unicode is looking in the wrong place.

end quote

Well, maybe. I notice that Ken writes using the email address
kenw@sybase.com and that on the unicode website he is listed as a Technical
Director of Unicode with the email address e0105@unicode.org with a mention
of Sybase Inc. noted there.

So, when Ken states the sentence above, is that Ken writing as a private
individual expressing a purely personal opinion, or Ken writing as a
representative of Sybase Inc. or Ken writing as a Technical Director of the
Unicode Consortium stating official Unicode Consortium policy?

I feel that that is an important issue that needs to be clarified. When I
saw the posting from Rick McGowan I did not immediately realize that he was
a Vice President of the Unicode Consortium but only recently noticed that he
used the email address rick@unicode.org so his posting did have the standing
of being from the Unicode organization. I notice though that he did not use
the e0107@unicode.org email address that accompanies his listing as a Vice
President on the unicode website. I just took his postings as being from
one of the participants in the list, a man named Rick, without realizing at
the time that he was a Vice President of the Unicode Consortium.

Recently, I contradicted Asmus Freytag about a matter concerning the use of
the phrase "unrestricted publication". I now find that he is, in fact,
Technical Vice President of the Unicode Consortium. He did not send his
posting from the e0104@unicode.org email address that is on the unicode web
site.

May I suggest that there exists scope for considerable confusion as to the
provenance of a statement made on this list where members of the unicode
user community may well not know who are the directors of the Unicode
consortium.

May I suggest a published rule that where an officer of the Unicode
Consortium states something to this list in his or her official capacity
that the person state within the posting their name and the official
capacity and that where that information is not present that any such
statement is not to be assumed as carrying the weight of any such official
position in the discussion.

I am genuinely confused by this situation. Ken is a Technical Director of
the Unicode Consortium and has the e0105@unicode.org email address. He
writes using the email address kenw@sybase.com and does not state that he is
a Technical Director of the Unicode Consortium in this posting. Ken makes
statements about what is appropriate posting in this list. Knowing that Ken
is a Technical Director of the Unicode Consortium makes me feel that I
should treat what he says as if it is an official ruling of the Unicode
Consortium that that is how this list is to be used. Yet is that a correct
interpretation? Is Ken just happily and in a friendly manner only seeking
to express a personal view?

As far as I am aware, the unicode@unicode.org list is the only list
available for discussing unicode. As unicode spreads and more and more
people are going to be using unicode and seeking to push forward
technically, the desire of people to discuss such matters may increase. I
wonder if the Unicode Consortium would please address the issue that Ken
states an opinion about. If people cannot legitimately and welcomely
discuss such issues here then surely all that will happen is that someone
will start an alt. newsgroup and the discussions will take place there.
Some of the experts on this list might then think it wise to monitor that
newsgroup and contribute from time to time. It will all be so inefficient
compared with the use of this list for such matters. Yes, there have been
quite a number of postings on the topic that I raised this week and that has
been great. Yet now I feel that there is some sort of indication that I
should not raise such a design issue in this list again. Maybe that is
actually the situation and people generally don't want research and
development discussed in this list. Yet I feel that if that is the policy,
then it should be stated, and if it is not the policy and research and
development can be discussed in this forum then that should be made clear.

I am not seeking to be contentious here, there is genuine confusion and I
feel that a clear policy stated now will be for the long term good as it is
far better for people to know where they stand rather than that uncertainty
needlessly deter people from raising topics that might well be considered
valuable if they were raised and which they are in fact perfectly welcome to
post.

William Overington

28 April 2001

Next message: Thomas Chan: "IDS question"
Previous message: William Overington: "Re: Tags and the Private Use Area"
Next in thread: Wm Seán Glen: "Re: Provenance of the Unicode Standard and of statements"
Reply: Wm Seán Glen: "Re: Provenance of the Unicode Standard and of statements"
Reply: James Kass: "Re: Tags and the Private Use Area"
Maybe reply: Kenneth Whistler: "Re: Provenance of the Unicode Standard and of statements (derives from Re: Tags and the Private Use Area)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT