From: Doug Ewell (dewell@roadrunner.com)
Date: Fri May 23 2008 - 19:55:10 CDT
"Behnam" <behnam dot rassi at gmail dot com> wrote:
> I wonder why Unicode didn't put language identifier to the paragraph.
Unicode 3.1 introduced a set of tag characters in the range U+E0000
through U+E007F ("Plane 14"), primarily to allow language tags to be
embedded in plain text, as a defense against an external proposal to use
invalid UTF-8 sequences for that purpose. However, the Plane 14 tag
characters were "strongly discouraged" by Unicode almost immediately
after being encoded, and have since been formally deprecated. For more
information, see sections 5.10 and 16.9 of TUS 5.0.
I've long been a critic of both the "deprecated at birth" encoding
strategy and the presumption that all interesting text is stored in a
markup language or other high-level format. But these are the rules,
and if you choose to use these tag characters you will probably be
alone.
-- Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Fri May 23 2008 - 19:56:50 CDT