schererm@us.ibm.com wrote:
>
> Line ends in Unicode may be unambiguously coded with LS (Line Separator,
> U+2028) and PS (Paragraph Separator, U+2029) characters, see TR 13.
>
> This means for emails in UTF-8, that they may not be "well-formed" because
> they may not contain CR (13) and/or LF (10) ASCII line ends.
>
> I believe there are (at least) three ways to deal with this, and I would
> like to know which one(s) is (are) recommended or used:
>
> 1) Disregard TR13 for emails and write only ASCII-style (LF, CR, CRLF)
> line ends.
We (Netscape) offer this as one option. I.e. plain text (non-HTML) UTF-8
or UTF-7 with CRLF for newline, as required by SMTP. The user can also
select HTML, or several other encodings (e.g. iso-8859-1).
> 2) Write Unicode email bodies with a modified or new encoding that breaks
> lines with LF...
> that are not part of the Unicode text, and encode the text itself with:
> 2a) disregard the minimum-length rule for UTF-8 and encode U+0000 to U+001f
> with
> (otherwise UTF-8-compliant) two-byte codes
I don't understand this.
> 2b) binary/base64-encoded UTF-16
Certainly seems legal from the MIME standpoint, but probably won't be
popular for a while.
> 2c) create an email-only variable-length encoding with 7 bits/email-byte
This already exists. It is called UTF-7.
> 2d) ?
>
> 3) Do not use LS and PS but instead require Unicode email bodies to use
> HTML or similar, and use <br> and <p> ;
> similar to (2), old-style line ends are inserted only for the sake of
> protocol-conformance and are not part of the displayed text
As I said, we offer this as an option. HTML may even be the default. (I
don't remember.)
> I guess that (1) and (3) would be the most popular choices.
Currently, non-Unicode-based encodings are the most popular. And plain
text is probably still the most popular. Both of these may change,
eventually.
Erik
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT