Hi James,
In ASP programming, there are a number of statements that you should
include to accomplish proper internationalization and character set
handling:
1. Session.LCID = 1036; ' or whatever
- be sure to set the Session.LCID property to the proper locale for the
user. By setting the LCID property, you will cause locale sensitive
functions such as "FormatDateTime" (as well as others) to return results
that are correctly formatted for the locale. You can determine the
proper locale by parsing the "ACCEPT_LANGUAGE" HTTP header. The example
above is the locale ID for France (fr-FR).
2. Session.CodePage = 65001 ' or whatever
- be sure to set the Session.CodePage property to the desired character
set that the browser will see. By setting the CodePage property, you
will cause the "Response.write" function to transcode UCS-2 data to the
proper character set of the browser. In the example above, "65001"
means UTF-8. This will cause the response.write to transcode from UCS-2
to UTF-8. In addition, setting the CodePage property will cause the
"request.form" method to interpret incoming octets as UTF-8 as well.
3. Response.CharSet = "UTF-8" ' or whatever
- be sure to set the Response.CharSet method to the desired character
set that the browser will see. By setting Response.CharSet, you will
cause the HTTP set content type header to be set to UTF-8. This will
tell the browser that the octets that are coming down from the server
are in UTF-8. Unfortunately, the CharSet property takes a string data
type, whereas the CodePage property takes a long. So you will have to
keep track of character sets both ways.
-Paul
-----Original Message-----
From: Magda Danish (Unicode)
Sent: Fri 14/09/2001 15:06
To: unicode@unicode.org
Cc:
Subject: FW: Unicode and the UTF8 encoding in HTML
-----Original Message-----
From: James Gardner [ mailto:james@robertsonlanguages.co.uk]
Sent: Thursday, September 13, 2001 1:23 AM
To: info@unicode.org
Subject: Questions
Dear Sir / Madam,
I am currently quite confused about the link between
Unicode and
the UTF8 character encoding in HTML. And also the link to how
files are
saved.
I have a Microsoft Active Server Page which is saved as
an ANSI
file.In this file I specify that it should use UTF8 encoding.
The data
(text) that is put into the page when it is created by the
server is
stored as unicode. The problem is that when it reaches a browser
it
seems to have been converted to Western European format, if I
change the
encoding back to UTF it display funny characters. Is this
because the
file is an ANSI file and WIndows is doing a conversion to the
most
appropriate format behind my back? Do I need to save a file as
unicode
as well as specifying utf8 encoding to properly display unicode
on the
web?
Any help would be most appreciated!
Regards,
James Gardner
-----------------------------------------------------------------------
This electronic communication is confidential and for the
exclusive use
of the addressee. The information contained in this e-mail and
any
attachments to it may also be privileged. If you are not the
intended
addressee you are prohibited from any disclosure, distribution
or
further copying or use of this communication or the information
in it.
If you have received this communication in error please notify
us as
soon as possible and delete the message from all places in your
computer
where it is stored. Please note that we cannot guarantee the
security of
electronic communications and you are advised to check any
attachments
for viruses. We do not accept liability for any loss resulting
from any
corruption or alteration of data or importation of any virus as
a result
of receiving this electronic communication. Please note that any
views
expressed in this electronic communication are those of the
author and
may not necessarily be the opinions or views of Robertson
Languages
International.
------------------------------------------------------------------------
This archive was generated by hypermail 2.1.2 : Sat Sep 15 2001 - 09:42:47 EDT