From: Doug Ewell (dewell@roadrunner.com)
Date: Mon May 26 2008 - 13:39:16 CDT
"Behnam" <behnam dot rassi at gmail dot com> wrote:
> From what I understand, or more precisely from what I don't understand
> (which would be most of it!) I think that your proposal for language
> identifier is very sophisticated and takes part in encoding standard
> scheme.
It's not my proposal, just one I described to you. I think that's what
you meant.
> This is probably why it is facing resistance because it is entering in
> a domain that most applications and their developers consider their
> own.
It has faced resistance because it is stateful -- that is, it applies to
an entire, open-ended chunk of text rather than just a single character
or a small, fixed run of characters -- and the Unicode Consortium
considers stateful mechanisms to be out of scope for a character
encoding standard. There are other mechanisms like this, such as the
Interlinear Annotation characters at U+FFF9 through U+FFFB, and those
are frowned upon as well.
> What I am suggesting is much much simpler, to the point of banality.
> Yet very efficient. But also much more acceptable to all parts. The
> paragraph language identifier that I'm suggesting, doesn't do anything
> in plain text at all. It just sits there as a part of the paragraph
> encoding.
The problem is that in Unicode, there is no concept of "the paragraph
encoding." There is simply a stream of characters. How they are
formatted and interpreted as paragraphs is dependent on a higher-level
protocol or application.
> Only when the paragraph is opened by an application, it can identify
> the language of the paragraph to the application and trigger the
> language support system of that application... or simply be ignored,
> just as in plain text.
You are correct that the tag characters can be ignored in certain plain
text contexts where no advantage can be taken of them. That was one of
the rationales behind burying them in Plane 14, and that strategy was
explicitly mentioned when the characters were introduced.
> The value of this identifier is just its existence, being there with
> the paragraph, wherever it goes. So an email client knows that this is
> for example a French paragraph. The word processor knows that it is a
> French paragraph and a web-page knows that it is a French paragraph.
> What do they do with this knowledge is totally up to them, with
> regards to whatever support system they already have developed that
> could use of this knowledge and whatever their customers ask them to
> be developed.
Preaching to the choir. As I said, go ahead and use them if you like,
but be aware they are deprecated and there is probably nobody else using
them. I thought they were a great idea, and even I don't use them any
more.
-- Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Mon May 26 2008 - 13:41:42 CDT