Unicode and end users

From: Martin Kochanski (unicode@cardbox.net)
Date: Thu Feb 14 2002 - 04:22:25 EST


First, let me thank everyone for their wise and experienced comments. This is exactly what this sort of list should be for...

For the sake of clarity, let me define two terms:
1. "Unicode" means Unicode.
2. "UNICODE" means "what an end user thinks when he sees the characters U, n, i, c, o, d, e on the screen, in that order".

What we are trying to establish is the exact meaning that UNICODE ought to have - that is, if it can have one at all.

I suggest that a more technical definition of UNICODE could be "a file format that can be read by programs that read UNICODE". This is pretty certain to be what a user understands by the word!

Now in the world of application programs intended for real human beings (as opposed, for example, to specialised technical tools), I cannot see that any program will survive for long if it cannot read, without user intervention, files written in all the self-describing Unicode formats (all those with a BOM). It follows that any of these formats could, with equal propriety, be described as UNICODE.

Moving back to output formats: this implies that the only requirement for a program that outputs data should be that if the user asks it to use UNICODE, the program uses one of the self-describing formats. The decision as to *which* of these formats to use would be up to the programmer. Depending on the circumstances, he may hard-wire a specific choice (perhaps whatever is best for the platform), or he may provide a configuration option accessible to more technical users.

Now, a question:

Are there, in fact, many circumstances in which it is necessary for an end user to create files that do *not* have a BOM at the beginning?



This archive was generated by hypermail 2.1.2 : Thu Feb 14 2002 - 03:56:15 EST