From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Feb 27 2003 - 17:07:55 EST
Not sure this is the right fourm to discuss this issue. I found this
"problem" when I debugging a UTF-8 email message.
When I look into some email that we have problem with, I just saw some
Content-Type header like the following:
Content-Type: text/html; charset="UTF-8"
As I remember, the MIME specification does not allowed "" with the
charset parameter and it should only accept
Content-Type: text/html; charset=UTF-8
but not charset="UTF-8"
So... I check the MIME spec try to figure out is it allowed or not. What
shock me is the original MIME specification RFC 1521 disallowed it
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html#sec-7.1.1
and
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html#sec-7.1.2
The formal grammar for the content-type header field for text is as
follows:
text-type := "text" "/" text-subtype [";" "charset" "=" charset]
text-subtype := "plain" / extension-token
charset := "us-ascii"/ "iso-8859-1"/ "iso-8859-2"/ "iso-8859-3"
/ "iso-8859-4"/ "iso-8859-5"/ "iso-8859-6"/ "iso-8859-7"
/ "iso-8859-8" / "iso-8859-9" / extension-token
but RFC 2045 which obsoleted RFC 1521 allow the " quoted charset name:
see http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2045.html#sec-5.1
parameter := attribute "=" value
attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.
....
value := token / quoted-string
Note that the value of a quoted string parameter does not include
the quotes. That is, the quotation marks in a quoted-string are not
a part of the value of the parameter, but are merely used to delimit
that parameter value. In addition, comments are allowed in
accordance with RFC 822
<http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc0822.html> rules for
structured header fields. Thus the following two forms
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are completely equivalent.
I never aware this differences between RFC 1521 and RFC 2045. Not sure
about you folks aware of it or not.
I also check HTTP 1.1- RFC 2068. and HTTP 1.0 RFC 1945 . It looks like
both specification have conflict language within the same specification
about this issue:
http://www.w3.org/Protocols/rfc1945/rfc1945
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2068.html
While one place say:
charset = "US-ASCII"
| "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
| "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
| "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
| "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
| "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
| token
and
token = 1*<any CHAR except CTLs or tspecials>
tspecials = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
which ruled out the use of quoted-string
The other placce it said
3.6 Media Types
HTTP uses Internet Media Types [13] in the Content-Type header field
(Section 10.5) in order to provide open and extensible data typing.
media-type = type "/" subtype *( ";" parameter )
....
parameter = attribute "=" value
....
value = token | quoted-string
:( :( :( :(
Therefore we need to make sure
1. all the mailer which receive email not only deal with charset=value
but also charset="value". I am not sure about Mozilla can deal with it
or not. How about your email program?
2. The browse can deal with
Content-Type: text/html; charset="value"
in additional to
Content-Type: text/html; charset=value
3. because we also use META tag in the HTML to reflect the HTTP header,
that mean the browser not only have to deal with the following kind of
meta tag
<meta http-equiv="content-type" content="text/html; charset=value">
<meta http-equiv="content-type" content='text/html; charset=value'>
but also
<meta http-equiv="content-type" content='text/html; charset="value"'>
:( :( :( :(
not sure does mozilla handle 2 or 3. How about IE?
However, for email, since RFC 1521 does NOT allow it, to make sure it
work with most of the email program, when we try to send out internet
email, we should try to use
Content-Type: text/html; charset=UTF-8
instead of
Content-Type: text/html; charset="UTF-8"
Can you check this issue with the product that you are working on ?
This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 17:53:02 EST