RE: Unicode site

From: Chris Wendt (christw@microsoft.com)
Date: Tue Oct 03 2000 - 23:23:21 EDT


From: Raghu Kolluru [mailto:raghu.kolluru@dig.com]

>I am using IE 5.5. I copy/paste some international content into a html form
>and submit, it doesnt send the data to the cgi as unicode for the first
time
>but sends it as unicode anything that I append to the trash from the first
>time.
>Is this some thing to do with IE 5.5 ?

Internet Explorer submits form data in the current browser encoding. The
current browser encoding is the encoding you see marked with a black dot in
the View.Encoding menu. If the form page is tagged with a charset Internet
Explorer understands, then the "current browser encoding" matches the
charset tag.

Your CGI can find out what the browser encoding was by inspecting the hidden
field "_charset_" which you placed into the form.

My suspicion is you call a string which looks like "〹" Unicode. In
common understanding though this is not Unicode, it is a HTML4 numeric
character reference (which happens to in turn refer to a Unicode code point
in decimal). Internet Explorer submits all characters in a form which do not
fit into the current encoding as numeric character references. The
characters that do fit are submitted in the current encoding.

It seems you characterized the characters submitted in the current encoding
as "trash" and the numeric character references as "Unicode".

To see all characters submitted in UTF-8, provide your form page in UTF-8
(and label the page as such) or in UCS-2 (no label needed, just the BOM).

There is no way to force Internet Explorer to submit every character as
numeric character reference.

Chris..



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT