RE: information request; using unicode in HTML form; urlencoded

From: Mike Brown (mbrown@corp.webb.net)
Date: Fri Oct 06 2000 - 18:03:16 EDT


> The last rule will clip Unicode charater to an 8-bit
> representation

The HTML Recommendation and the IETF RFC for URIs both cover this. Anything
URL-encoded is supposed to be UTF-8 encoded first (see the URI RFC).
However, the HTML Recommendation's section on form data is a little vague
about encoding, especially if you are using a MIME message instead of
URL-encoding. Also, the major browsers will typically submit form data with
the same charset as the HTML document containing the form.

To encourage the browser to send URL-encoded UTF-8 form data, you should
make sure that the HTML document with the form is itself UTF-8 encoded, and
declares itself as such, usually via the appropriate <meta> element. Beyond
that there is still a risk that the user might override the encoding on
their end, but what can you do.

-Mike



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT