Re: Variations of UTF-16

From: Jonathan Coxhead (jonathan@doves.demon.co.uk)
Date: Wed Apr 24 2002 - 20:12:43 EDT


   On 24 Apr 2002, at 14:38, Jungshik Shin wrote:

> We don't expect text tools
> to work on files in UTF-16 the same way as we would expect them to work
> on files in UTF-8 or other ASCII-compatible encodings.

   But it might well be desirable to have UNIX-like tools that work on UTF-16
files, in a way analogous to the way that the existing tools work with ASCII.
The underlying philosophy of the UNIX toolset can clearly be applied with equal
success in a world where "plain text" is UTF-16 everywhere:

      cat16 f1 f2 f3 f4 | sort16 | uniq16 | sed16 '....' > f5

   As we see, we need different versions of all the text tools. This is
inconvenient, but not an insurmountable problem. (Maybe they could even be
derived from the same source code as the 8-bit varieties. Maybe some future
system will have *only* 16-bit text tools.)

   But a BOM in every UTF-16 plain text file would make this completely
hopeless. If we ever think we might want to do UNIX-style text processing on
UTF-16, we have to resist that!

        /|
 o o o (_|/
        /|
       (_/



This archive was generated by hypermail 2.1.2 : Wed Apr 24 2002 - 20:57:53 EDT