From: Doug Ewell (dewell@adelphia.net)
Date: Tue Feb 18 2003 - 01:30:04 EST
Tex Texin <tex at i18nguy dot com> wrote:
> 6) "UTF-8 signatures are not evil" ok. In and of themselves, they are
> not. Mandating their use everywhere is evil. Notepad is broken in
> always outputting it, since notepad is used for files that are also
> not plain text. The rest of the world should not change because
> notepad is broken. There are plenty of files that have their encoding
> indicated by other means. Adding a UTF-8 BOM where they are not needed
> breaks existing software, filters as Martin mentioned, and adds
> ambiguity in many situations where there is no ambiguity.
This is the tricky part, IMHO.
Notepad is *intended* for plain text. The fact that many people use
Notepad for HTML, something for which it wasn't really intended, isn't
necessarily a defect in Notepad. In fact, MS has always presented
Notepad as a really, really stripped-down editor, almost a toy, and
pointed users toward WordPad (and before that, Write) if they wanted to
get serious work done. (I'm sure MS would prefer we use FrontPage or
Word to create Web pages.) I was frankly surprised that they upgraded
Notepad in Windows 2000 to support UTF-8 and UTF-16.
MS could easily "fix" Notepad to make the writing of UTF-8 signatures a
user-controllable option, as it is in SC UniPad. But as I wrote
earlier, removing the signature would mean that Notepad would have to
rely on either (a) autodetection or (b) user intervention and knowledge
in order to work with UTF-8. This is OK for me, you, and everyone else
on the Unicode mailing list; we understand that there are different
encoding schemes and that we may need to intervene to resolve
differences or errors. But I'm not sure the average Notepad user
understands all this and can deal with it without blaming Unicode.
Don't forget, there are many users who think "Unicode" means "files get
twice as big."
There will probably be a day when Windows works almost exclusively with
UTF-8 files, rather than files encoded in local 8-bit code pages, and
the heuristic will be reduced to:
1. If any invalid UTF-8 sequences exist, assume local code page.
2. Else, assume UTF-8.
When that day comes, it will be safe for Windows to jettison the UTF-8
signature.
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.5 : Tue Feb 18 2003 - 02:20:35 EST