Re: texteditors that can process and save in different encodings from Philippe Verdy on 2012-10-19 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 19 Oct 2012 22:18:33 +0200

2012/10/18 Doug Ewell <doug_at_ewellic.org>:
> Philippe Verdy wrote:
>> ASCII,
>
> A strict subset of UTF-8, so no need to support this separately.

Not really. If the file to save does not need any character which is
found in an 8-bit extended character set (there are many of them),
saving them as ASCII (i.e. saving this charset information in the
metadata) still preserves the compatibility of the encoded text with
all these other extended charsets (notably all ISO 8859-* codepages as
well as UTF-8).

This does not mean that the encoder will be different. The difference
is only in the metadata you emit for the encoded file. If you indicate
UTF-8 always, the file may be rejected by all applications that expect
not being able to handle Unicode correctly. So they will reject the
file without even trying to decode it.

This matches the need for "being lenient for reading (in other
applications), but strict when writing (just specify the real minimum
requirements for decoding the file)".

However if the file already specified the "UTF-8" encoding, it should
not be changed blindly and automatically into "ASCII", because further
editors may restrict the usable character set, or could attempt to
store approximations if ever you insert a non-ASCII character in what
was intented to be compatible directly with UTF-8.

This applies for example to emails (each email is independant from
others, even if they are replying to a previous one being partially or
fully encoded in the response; the link between emails is not part of
their text, but part of their tracking MIME headers and of metadata
for local processing in mail agents or proxies) : minimize the
decoding requirements when sending it.
Received on Fri Oct 19 2012 - 15:23:25 CDT

This archive was generated by hypermail 2.2.0 : Fri Oct 19 2012 - 15:23:27 CDT