Finally got around to reading the MLSF Internet Draft.
Couple of comments:
1) One thing really made me jump: the first sentence in the Abstract.
"While UTF-8 solves most internationalization (I18N) problems, ..."
That makes as much sense to me as saying that QuotedPrintable solves
most I18N problems for Western Europe. It's not QP which does that,
it's ISO 8859-1. QP is just one way to encode 8859-1 text so it can
past most mail relays without corruption. But Base64 is another
way to do the same thing (which can make statistical sense for some
languages).
Similarly, it's not UTF-8 which solves the wider problem of
world-wide I18N, it's Unicode (and/or ISO 10646). The canonical
representation of Unicode is 16-bit quantities (UCS-2). UTF-8 is
nothing more than one of many possible transformations (UTF-7 is
another that's already defined: RFC 2152). If I understood right,
UTF-8 was created mainly to make Unicode coexist reasonably well
with existing OSs that use 8-bit characters, for example Unix.
Not that I agree with the proposal, but the MLSF Internet Draft
should make clear what the implications are of trying to put
language tags into UTF-8 (for example, assumption that UTF-8 becomes
the canonical representation of Unicode, loss of tagging when
converting to other CESs). I guess the pros and cons have been
discussed at length here.
2) It would have been nice to put a few examples of actual UTF-8 strings
with language tags (in hex of course) in the document.
As to the fundamental issue of whether language tagging belongs in
plain-text Unicode, I must say I'm pretty neutral at this point. I
think they could be useful. But, as Frank was saying, if it's going
to take 10 years to converge to an acceptable solution, then it
doesn't belong in plain text, but at a higher level.
Pierre
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT