Philippe Verdy via Unicode wrote in <CAGa7JC3UomnN+Qzr3JGhqgJY+e-y6AYFk+\
w9+jEARW4Ghyk8hg_at_mail.gmail.com>:
|You forget that Base64 (as used in MIME) does not follow these rules \
|as it allows multiple different encodings for the same source binary. \
|MIME actually
|splits a binary object into multiple fragments at random positions, \
|and then encodes these fragments separately. Also MIME uses an extension \
|of Base64
|where it allows some variations in the encoding alphabet (so even the \
|same fragment of the same length may have two disting encodings).
|
|Base64 in MIME is different from standard Base64 (which never splits \
|the binary object before encoding it, and uses a strict alphabet of \
|64 ASCII
|characters, allowing no variation). So MIME requires special handling: \
|the assumpton that a binary message is encoded the same is wrong, but \
|MIME still
|requires that this non unique Base64 encoding will be decoded back \
|to the same initial (unsplitted) binary object (independantly of its \
|size and
|independantly of the splitting boundaries used in the transport, which \
|may change during the transport).
Base64 is defined in RFC 2045 (Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies).
It is a content-transfer-encoding and encodes any data
transparently into a 7 bit clean ASCII _and_ EBCDIC compatible
(the authors commemorate that) text.
When decoding it reverts this representation into its original form.
Ok, there is the CRLF newline problem, as below.
What do you mean by "splitting"?
...
The only variance is described as:
Care must be taken to use the proper octets for line breaks if base64
encoding is applied directly to text material that has not been
converted to canonical form. In particular, text line breaks must be
converted into CRLF sequences prior to base64 encoding. The
important thing to note is that this may be done directly by the
encoder rather than in a prior canonicalization step in some
implementations.
This is MIME, it specifies (in the same RFC):
2.10. Lines
"Lines" are defined as sequences of octets separated by a CRLF
sequences. This is consistent with both RFC 821 and RFC 822.
"Lines" only refers to a unit of data in a message, which may or may
not correspond to something that is actually displayed by a user
agent.
and furthermore
6.5. Translating Encodings
The quoted-printable and base64 encodings are designed so that
conversion between them is possible. The only issue that arises in
such a conversion is the handling of hard line breaks in quoted-
printable encoding output. When converting from quoted-printable to
base64 a hard line break in the quoted-printable form represents a
CRLF sequence in the canonical form of the data. It must therefore be
converted to a corresponding encoded CRLF in the base64 form of the
data. Similarly, a CRLF sequence in the canonical form of the data
obtained after base64 decoding must be converted to a quoted-
printable hard line break, but ONLY when converting text data.
So we go over
6.6. Canonical Encoding Model
There was some confusion, in the previous versions of this RFC,
regarding the model for when email data was to be converted to
canonical form and encoded, and in particular how this process would
affect the treatment of CRLFs, given that the representation of
newlines varies greatly from system to system, and the relationship
between content-transfer-encodings and character sets. A canonical
model for encoding is presented in RFC 2049 for this reason.
to RFC 2049 where we find
For example, in the case of text/plain data, the text
must be converted to a supported character set and
lines must be delimited with CRLF delimiters in
accordance with RFC 822. Note that the restriction on
line lengths implied by RFC 822 is eliminated if the
next step employs either quoted-printable or base64
encoding.
and, later
Conversion from entity form to local form is accomplished by
reversing these steps. Note that reversal of these steps may produce
differing results since there is no guarantee that the original and
final local forms are the same.
and, later
NOTE: Some confusion has been caused by systems that represent
messages in a format which uses local newline conventions which
differ from the RFC822 CRLF convention. It is important to note that
these formats are not canonical RFC822/MIME. These formats are
instead *encodings* of RFC822, where CRLF sequences in the canonical
representation of the message are encoded as the local newline
convention. Note that formats which encode CRLF sequences as, for
example, LF are not capable of representing MIME messages containing
binary data which contains LF octets not part of CRLF line separation
sequences.
Whoever understands this emojibake.
My MUA still gnaws at antiquated structures (i am too lazy), but
in quoted-printable we encode CRLF in the raw text to "=0D=0A=",
i.e., a trailing soft line break so that data is decoded as plain
CRLF again. Something like that it should be i think.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
Received on Sat Oct 13 2018 - 11:49:57 CDT
This archive was generated by hypermail 2.2.0 : Sat Oct 13 2018 - 11:49:57 CDT