On Sat, 7 Jun 1997, Martin J. Duerst wrote:
> There are
> likewise good reasons, known widely in the industry, for working
> with a fixed-width process code.
While I can envision a number of good reasons for fixed-width process
code, I'm skeptical that UTF-16 counts as a fixed-width process code and I
see no evidence that UCS-4 is used by the industry.
> ] 5) Some people have indicated a desire for multi-valued attributes.
>
> Multi-valued attributes seem to be desired. Language alternatives
> are one kind of multi-valued attributes. It would be tedious to
> handle them specially (which would have to be done with MLSF
> alternatives).
Langauge alternatives have very different semantics from multi-valued
attributes. The issues are orthogonal.
> Metadata would make the best place for language information, wouldn't it?
It would be fine if every attribute contains one and only one langauge.
I'm not sure it's correct to presume that both alternative language values
and mixed-language values will never ever be needed.
> The values in each entry are short, as we have been told, and this
> means that indeed the possibility that there is multilingual text in
> them that needs to be tagged is low.
Maybe. On the other hand, mixed language error strings are very likely
to occur. What about errors where part of the message comes from a
plug-in or module which doesn't support the client's preferred language?
> Another idea is to take the attribute name structure, and append a
> language tag at the end after an additional dot. This would nicely
> deal with alternatives, and would make language searchable in the
> same way as other things. Maybe a special separator could be defined
> to be used in front of the final language tag, with the special
> semantics that there is no need for a trailing * wildcard in an
> attribute specification but still all the attributes with different
> languages get searched.
This is a *very* complex solution. It breaks the attribute/value model.
It means a client may get multiple attributes back when it searches for a
single one -- forcing a lot of complexity on the client. It probably has
serious performance impact on the server and might very well defeat
a number of good indexing schemes.
> All of the above proposals would solve the bulk of ACAP language
> identification problems in a maner more appropriate to the protocol
> and the data model than MLSF.
I disagree.
> What remains is the language of the Alert and Warning messages.
> For this, the correct solution is language negotiation, i.e.
> the client telling the server about the languages preferred by
> the user, and the server telling the client about the language
> it will use. Alternates in this context are not a solution,
> because they don't scale.
What about mixed language error messages, and alert messages which aren't
available in the client's preferred language? I agree that regardless of
the solution chosen, the client will need to express a preferred language
for error text.
In fact, I've been considering every solution you've proposed for months
and they do not appear to be adequate.
> As I have said, I'm in no way against language tagging.
> But it should be done by considering the structure and
> the needs of the protocol.
The problem is that *every* human-readable string should be labelled with
an RFC 1766 natural language according the the IAB charset workshop
recommendations. It seems technically illogical to have to add complexity
to every single protocol to carry the tags out of band from the human
readable strings. Why not just create a format for "human readable strings
which meets the IAB charset workshop recommendations" and solve the
problem for all protocols? I think it's quite clear that solving this at
the protocol level is the wrong level from an architectural standpoint. It
may seem to keep Unicode "purer", but only at the expense of a significant
increase in complexity for *every* Internet protocol carrying human
readable text.
Solving this below the application protocol level is more expressive and
far more practical. The architectural complexity at the protocol level is
so significant that it far outweighs any aesthetic concerns you have with
UTF-8/Unicode.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT