From: Hans Aberg (haberg@math.su.se)
Date: Mon Nov 28 2005 - 14:38:00 CST
On 28 Nov 2005, at 20:49, Neil Harris wrote:
> The set of ASCII strings is a proper subset of the set of UTF-8
> strings, so no information would need to be stored about which of
> those coding was being used.
So it would seem, but I think that UNIX under some circumstances,
though I do not remember which, needs to know that it is ASCII and
not anything else. But I'll guess, one shall what works best see when
making a UTF-8 enabled UNIX.
> Now, ISO 8859-1, that's a different matter -- I suppose you could
> still use the property that _almost all_ non-pure-ASCII ISO 8859-1
> natural language strings are not also valid UTF-8 strings for
> backwards compatibility, and ditto for most other fixed 8-bit
> encodings, but I certainly wouldn't be willing to trust my
> filesystem to this sort of hack.
I'll pass on this one. There are different approaches, mixed
encodings or single UTF-8, though.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Mon Nov 28 2005 - 16:36:25 CST