RE: information request; using Unicode in HTML form; urlencoded

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Oct 06 2000 - 19:11:13 EDT


Mike,

In practice however, people usually write pages containing forms in a
specific charset and expect the URL to return in the same charset. The will
escape the chars if not ASCII but in the charset (Which could be UTF-8).

Carl

-----Original Message-----
From: Mike Brown [mailto:mbrown@corp.webb.net]
Sent: Friday, October 06, 2000 2:45 PM
To: Unicode List
Cc: 'hle@comergent.com'
Subject: RE: information request; using unicode in HTML form; urlencoded

> The last rule will clip Unicode charater to an 8-bit
> representation

The HTML Recommendation and the IETF RFC for URIs both cover this. Anything
URL-encoded is supposed to be UTF-8 encoded first (see the URI RFC).
However, the HTML Recommendation's section on form data is a little vague
about encoding, especially if you are using a MIME message instead of
URL-encoding. Also, the major browsers will typically submit form data with
the same charset as the HTML document containing the form.

To encourage the browser to send URL-encoded UTF-8 form data, you should
make sure that the HTML document with the form is itself UTF-8 encoded, and
declares itself as such, usually via the appropriate <meta> element. Beyond
that there is still a risk that the user might override the encoding on
their end, but what can you do.

-Mike



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT