Re: Is there a UTF that allows ISO 8859-1?

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Sep 02 1998 - 15:26:52 EDT


>
> > Ein unbekannter Locale Name wurde übergeben.
> > Ein unbekannter Locale Name wurde übergeben.
>
> > (Note that I was able to cut and paste the UTF-8 string right into
> > the Latin-1 text editor that I am editing this mail in, without any
> > loss of data or complaint from the operating system.)
>
> That's because the ü (U+00FC LATIN SMALL LETTER U WITH DIAERESIS) you
> chose in your example happens to belong to the lucky 50% of non-ASCII
> characters that are expressed with safe values [\xA0-\xFF] in UTF-8.
> I doubt that you would be as successful with the companion capital Ü
> which is encoded as =C3=9C Ü
>
>

How about:

8 = ZZZZZ, "内部ディレクトリ制御レイヤ・エラー"

Japanese for:

8 = ZZZZZ, "internal directory control layer error"

Did that hit enough non-safe values? It also cut and pasted
right into the Latin-1 text editor with no complaints or
loss of data:

0001220 2022 e586 85e9 83a8 e383 87e3 82a3 e383
0001240 ace3 82af e383 88e3 83aa e588 b6e5 bea1
0001260 e383 ace3 82a4 e383 a4e3 83bb e382 a8e3
0001300 83a9 e383 bc22 0a0a

These 17 Japanese characters also survived getting shlepped into
a UNIX file system, and pushed over to a Windows NT file system and
getting displayed in a browser that understands UTF-8.

The point is that UTF-8 works pretty darn good with existing
8-bit clean software and file systems, even if they are completely
unaware of UTF-8 itself.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT