Re: utf-8 != latin-1

From: Steven R. Loomis (srl@jtcsv.com)
Date: Sat Oct 14 2000 - 18:04:08 EDT


Doug Ewell wrote:
> Why? As an illegal UTF-8 sequence, it shouldn't be interpreted as anything.

 It wasn't interpreted as anything. It halted processing at that point
in the text, as an error.

George Zeigler wrote:
> I didn't get it. So what happens if a company had a Job site in Unicode,
> and people were copying resume text from Word written in ISO 8859-1
> and pasting into a text window in the browser? Does the character set
> automatically convert correctly. Or does the user need to use a character set
> converter like Recode?

 It was pasted into Windows Notepad or some other editor editing an XML
file. XML files unless otherwise tagged are UTF-8, but the editor
thought it was something like Windows-1252. So, the right thing to do
*might* be to tag the file as being 'windows-1252'. A better solution
would be to use UTF-8 aware editors only.

 My point is that it was hard to tell visually whether the data being
copied was a 'safe' subset of both utf-8 and windows-1252 [such as
ASCII].

-s



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT