FW: information request; using unicode in HTML form; urlencoded

From: Magda Danish (Unicode) (v-magdad@microsoft.com)
Date: Fri Oct 06 2000 - 13:04:53 EDT


-----Original Message-----
From: Hung Le [mailto:hle@comergent.com]
Sent: Thursday, October 05, 2000 3:21 PM
To: 'info@unicode.org'
Subject: information request; using unicode in HTML form; urlencoded

        Hi,

        Our company is exploring the idea of using Unicode in our web pages.
We ran into a problem that, despite our effort researching for the last two
weeks, we
are not able to find an answer. The problem is related to passing text from
an HTML form to the webserver.

        From the user's perspective:
                . we present the user a web page with a form.
                . user fills the form
                . user click on "Submit"
                . the browser post the data entered to the server
        
        From what I can gather so far, the data flow is followed:
                . when the user click on the submit button, the browser
urlencoded the
                data using the following algorithm:

The ASCII characters 'a' through 'z', 'A' through 'Z', and '0' through '9'
remain the same.
The space character ' ' is converted into a plus sign '+'.
All other characters are converted into the 3-character string "%xy", where
xy is the two-digit hexadecimal representation of the lower 8-bits of the
character.

                The last rule will clip Unicode charater to an 8-bit
representation and
thus the data entered to the HTML form will not make it back to the web
server.

        Have you have experience in this area? How does one capture the data
in
an HTML form in Unicode and send it along when user click on the "Submit"
button?

        Thanks for any help you can provide.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT