Re: UTF-7,5

Date: Tue Jul 15 1997 - 06:28:51 EDT

Markus Kuhn schrieb:

> I absolutely do not care, and you shouldn't either.

I certainly care about the state of my display, printer or terminal.

>C1 characters will
>only cause problems on those receiving systems that do not understand
>UTF-8 at all.

Better say: Don't understand UCS-2/UCS-4 at all. I think that also UTF-8 is
designed as a migration tool with the UNIX operating system in mind.
Allthough UTF-8 has the nice feature of being self-segregating (such that
the problem of wrapping double byte characters does not exist) I don't
think that it will persist eternally. It shall be succeeded by UCS-2/4 in the
long run.

>And on those systems, *ALL* non-ASCII characters are messed
>make things any worse. Broken is broken.

But containing C1 control characters the file is reasonably more broken
than a file containing only Latin-1 characters. The latter I can view (
because there is a latin-1 font on my system), I can even analyse a suspect
spot, I can edit it and I can store it as a text file.

>I do not think it is good
>practive to support a concept of slightly-less-broken-than-totally-broken
>as you would do it in your UTF-7,5 Latin-1 backwards compatibility.

The main issue is migration, not compatibility. I see the place for UTF-7,5
specially in USENET News, which is currently 8-bit clean. It also helps,
that Latin-1-letters are >>almost human readable<<.

>Standardization is the elimination of unnecessary diversity in technical
>specifications, and not the encouragment of technical diversity by
>standardizing lots of different new alternative hacks.

I completely agree. UCS-2/4 is the goal. I am thinking of a migration path to

>By the way, with proper rounding it should be called UTF-7,6 as
>log_2(191) is 7.5774288280357486893 [...]

Nice proposal. I will make a note on this on my web page really soon.

--J"org Knappen

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT