Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Wed Aug 12 1998 - 06:06:25 EDT


We had this Latin-1-compatible-UTF discussions several times before, it
is certainly in no way a new idea (check the news:comp.std.internat
archives):

 - You will always need new software, no matter whether you use UTF-8 or
   any fancy encoding in which ISO 8859-1 file do not have to be recoded.
   So don't overestimate the practical advantages of the illusion of
   backwards compatibility. The advantage of backwards compatibility with
   ASCII in UTF-8 is only important because a number of ASCII characters
   such as NUL and SOLIDUS have special functions in software that is
   otherwise completely ignorant of the character set. No Latin-1
   character has such special semantics in any software I am aware
   of (I have yet to see a SHY implementation that can't be deactivated
   easily).

 - UTF-8 has a large number of very neat properties that are not possible
   to get with any of the proposals for a Latin-1 compatible encoding,
   especially the combination of self-synchronization, the compactness
   (only up to 3 characters length) and the preservation of the UCS-4
   lexical string order (important for things such as B-trees in DBMSs).

If you really need a Latin-1 compatible UTF, then just use UTF-7 but do
not transform the characters in the 0x80-0xff range. This is a straight
forward modification of UTF-7 and it costs you just one or two bytes to
change in an UTF-7 implementation. This technique is so obvious and
trivial that it is not even worth to write a formal specification for
it.

I hope it will not become popular. Another UCS encoding is certainly not
what the world has been waiting for.

Markus

-- 
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org,  home page: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT