From: Doug Ewell (doug@ewellic.org)
Date: Sat Jan 10 2009 - 23:37:57 CST
Michael D'Errico <mike dash list at pobox dot com> wrote:
> Well people say if you want to encode non-plain-text things, then you
> need to start your own standard. Plain text is a subset of everything
> you would want to encode, so it makes sense to include everything from
> Unicode in this new standard. Trying to minimize the effort required
> to implement a new standard, it also makes sense to utilize the UTF-8
> mechanism (without the 17 plane artificial limitation placed on it) to
> access the Unicode part as well as the new non-plain-text part. There
> is nothing "evil and dangerous" about it, just unfamiliar and
> untested.
If you make it look like UTF-8, people and programs will treat it as if
it were UTF-8 and try to feed it into processes built to handle UTF-8.
That's what is evil and dangerous.
A hypothetical "Everycode" standard that encodes arbitrary bits of data
certainly should include Unicode characters as a subset, but the
encoding format has to be different enough that nobody will be confused
about which standard the data belongs to. Check the mail archives;
there are lots of possible "UTF" ideas that could have been used for
Unicode, but were not, and might make sense for your project instead.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sat Jan 10 2009 - 23:40:20 CST