Re: Variations of UTF-16

From: Shlomi Tal (shlompi@hotmail.com)
Date: Thu Apr 25 2002 - 00:46:11 EDT


{{ But a BOM in every UTF-16 plain text file would make this completely
hopeless. If we ever think we might want to do UNIX-style text processing on
UTF-16, we have to resist that! }}

If you're going to take the trouble of making text tools 16-bit aware, then
you can afford to make them BOM-aware too.

type a.txt b.txt c.txt > d.txt

on Windows 2000, assuming that they are all UTF-16 (with an FFFE at the
beginning of each, as is usual in MS-Windows Unicode files), strips every
BOM except the last, so that d.txt has only the usual one initial FFFE. So
it's not an immovable obstacle.

Concerning text files: nearly all of plain-text Unicode I've ever seen is in
UTF-8. However, the ubiquitous MS-Office documents, from Office 2000
onwards, are all in UTF-16 (little-endian, without BOM).

_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com



This archive was generated by hypermail 2.1.2 : Thu Apr 25 2002 - 01:57:18 EDT