Re: Comments on <draft-ietf-acap-mlsf-00.txt>?

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Thu Jun 05 1997 - 07:10:33 EDT


I'm happy to see that the discussion has turned away from
initial upsets and flaming. Chris has been very instrumental
at taking the right solution for UTF-8 URLs for IMAP, and
is working in other places to help get Unicode accepted and
used widely.

I had a chance to read his MLSF draft before it became public;
I had both positive and negative feelings about it, but wasn't
able to express them very well at that time and didn't have
any better solution yet. I'll talk about technical issues
in another mail; here I'll address more the requirements.

> Chris Newman <Chris.Newman@innosoft.com> wrote:
>
> On Wed, 4 Jun 1997, David Goldsmith wrote:
> > Chris Newman (Chris.Newman@innosoft.com) wrote:
> > >What about a multi-valued attribute where each value may be in multiple
> > >languages?
> >
> > Since Unicode can support multiple languages, can you give an example
> > where language tagging is necessary *and* there is only plain text
> > present?
>
> Take an "alternate names" attribute in a personal addressbook, which may
> be multivalued. Each of these multiple names may also be represented in
> different langauges. Fonts, styles, and other viewer based attributes are
> completely unnecessary as they don't have anything to do with the name.
> But the language of the name representation is necessary to select the
> appropriate variant string.
>
> This needs a solution that is above the character stream level and below
> the application protocol level.

I think that in several places in different application protocols,
there might be a need for the functionality of MLSFs.

However, it is very important for each application protocol to
consider exactly what it's needs with respect to language are.
In some cases, e.g. HTTP, it's important to synchronize the
server to the language needs of the user; for this we need
things such as Accept-Language. Sending back pages, or even
only a short warning string, in a lot of alternatives, and
doing the selection at the client would be increasingly
inefficient when the number of supported languages grows.

In the case of ACAP, as far as I understand it, the information
will mostly be private information of users that they have
entered into a kind of "database" and that they can retrieve
later. The main doubts this gives me with respect to lanugage
tagging is "How many users will be ready to tag all their data
if they themselves know what language it is anyway?".

With respect to the example of alternate names, I think it is
definitely a good thing to be able to store the name of a
person e.g. both in Japanese and in ASCII. But that immediately
gives raise to questions. For example, in terms of language,
even the ASCII form is still Japanese, isn't it? That means,
if proper names have a language at all. The functionalities
we would like to have on this could be:
- Use ASCII on a device (e.g. a palmtop) that doesn't support
        Japanese. This is a character/glyph repertoire issue,
        not a language issue.
- Use the Japanese name in a Japanese mail, but ASCII if you
        are writing in English. This is a language issue,
        but I am sure there are other features on which
        such switches could be made (e.g. private vs. office
        addresses, addresses in different places for the
        cheapest phonecall,...). I'm not sure whether it
        is best to have language in a string-like format,
        while other aspects might be treated differently
        or ignored.
I guess a better understanding of the interaction between
language and other protocol parameters in the various
application scenarios will help a lot.

Very few words here also to the CJK "issue". This is in general
overestimated, of course by those that think it is a problem
(both originally and as a consequence of having others heard
speaking of it as a problem) but also by those that get alerted
when they see it raised as a critique to Unicode/ISO 10646.

In various IETF WGs, it has been very controversial, but I
think things have changed a lot. There are only very few
people opposing Unicode/ISO 10646 anymore, and the way
they are voicing their opposition mostly makes very apparent
their lack of arguments.

In respect to ACAP, and in particular to names, I have serious
doubts that it is necessary to identify glyph differences that
may (or may not) depend on typographic practice in various areas
where ideographs are used. Newspapers and other publication in all
these regions without any problem always use local typographic
conventions for foreign names, in many cases with larger
differences than those that gould be induced by Han unification.
If a Japanese has to tolerate that her name appears slightly
differently in a Chinese newspaper (and vice versa), I think
it should be no problem to tolerate that it appears similarly
differently on the screen of a Japanese mail correspondent.

> Were it encoded into the character
> stream, that would result in quoting problems which would destroy
> server-side searching capabilities. Were it put at the application
> protocol level it would require a datastructure so complicated as to be
> completely impractical and unusable. In addition, putting it at the
> application protocol level means that a different solution for the same
> problem will be necessary in each application protocol.

Not every application protocol may need a different solution. But
each application protocol should carefully consider the interaction
of language with other protocol parameters and features, and think
things through. If several protocols can share the same solution,
that's fine, but to think that a single solution can just be
pluged in everywhere is dangerous.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT