From: Doug Ewell (dewell@roadrunner.com)
Date: Fri May 23 2008 - 19:55:10 CDT
"Behnam" <behnam dot rassi at gmail dot com> wrote:
> I wonder why Unicode didn't put language identifier to the paragraph.
Unicode 3.1 introduced a set of tag characters in the range U+E0000 
through U+E007F ("Plane 14"), primarily to allow language tags to be 
embedded in plain text, as a defense against an external proposal to use 
invalid UTF-8 sequences for that purpose.  However, the Plane 14 tag 
characters were "strongly discouraged" by Unicode almost immediately 
after being encoded, and have since been formally deprecated.  For more 
information, see sections 5.10 and 16.9 of TUS 5.0.
I've long been a critic of both the "deprecated at birth" encoding 
strategy and the presumption that all interesting text is stored in a 
markup language or other high-level format.  But these are the rules, 
and if you choose to use these tag characters you will probably be 
alone.
-- Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Fri May 23 2008 - 19:56:50 CDT