Re: UTF-8, C1 controls, and UNIX

From: DougEwell2@cs.com
Date: Thu Mar 01 2001 - 01:16:37 EST


In a message dated 2001-02-28 15:13:02 Pacific Standard Time,
fdc@columbia.edu writes:

> > Maybe one should make a transmission safe UTF that left C1 alone?
> >
> Remember this? --
>
> From: Markus Scherer <markus.scherer@jtcsv.com>
> To: "Unicode List" <unicode@unicode.org>
> Date: Mon, 10 Apr 2000 15:23:53 -0800 (GMT-0800)
> Subject: What if UTF-8 had been defined after UTF-16?
>
> What if UTF-8 had been defined just for the code point range 0..0x10ffff?
> What if UTF-8 had been designed to be not just "File-System-Safe" but also
> "Terminal-Safe"?

Keld may have been referring to his own "UTF-7d5", described at
<http://www.uni-mainz.de/~knappen/jk009.html>. Like UTF-8, it can express
basic characters in no more than 3 code units, but unlike UTF-8 it requires
the additional layer of UTF-16 to express supplementary characters (so they
take 6 code units).

UTF-1, the original UTF, was also designed not to use C0 or C1 bytes, or
space or DEL, except to represent themselves. However, apparently the
"slash" issue was deemed more critical than avoiding C1 bytes.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT