What if UTF-8 had been defined after UTF-16?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon Apr 10 2000 - 19:25:26 EDT


What if UTF-8 had been defined just for the code point range 0..0x10ffff?
What if UTF-8 had been designed to be not just "File-System-Safe" but also "Terminal-Safe"?

UTF-8 could have had all the nice features that it has now, plus:
- C1 control codes (0x80..0x9f) passed through as single bytes
- no sequences longer than 4 bytes, BMP still covered with 3 bytes
- no checking for code points > 0x10ffff because
  it could have been designed just for that range
- no minimum-length problem -> no security concerns
- all byte values used for some encoding

It would have been possible. Interested? See http://www.mindspring.com/~markus.scherer/utf-8c1.html .

Note: This is _not_ an approved UTF. I am _not_ proposing this as a new UTF. This is _not_ compatible with any existing UTF or other Unicode implementation. It is just a play with bits and bytes, a "what if", a "Gedankenexperiment".

Just to share a thought -

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT