RE: Unicode and the UTF8 encoding in HTML

From: Paul Deuter (Paul.Deuter@plumtree.com)
Date: Sat Sep 15 2001 - 10:42:14 EDT


Hi James,
 
In ASP programming, there are a number of statements that you should
include to accomplish proper internationalization and character set
handling:
 
1. Session.LCID = 1036; ' or whatever
- be sure to set the Session.LCID property to the proper locale for the
user. By setting the LCID property, you will cause locale sensitive
functions such as "FormatDateTime" (as well as others) to return results
that are correctly formatted for the locale. You can determine the
proper locale by parsing the "ACCEPT_LANGUAGE" HTTP header. The example
above is the locale ID for France (fr-FR).
 
2. Session.CodePage = 65001 ' or whatever
- be sure to set the Session.CodePage property to the desired character
set that the browser will see. By setting the CodePage property, you
will cause the "Response.write" function to transcode UCS-2 data to the
proper character set of the browser. In the example above, "65001"
means UTF-8. This will cause the response.write to transcode from UCS-2
to UTF-8. In addition, setting the CodePage property will cause the
"request.form" method to interpret incoming octets as UTF-8 as well.
 
3. Response.CharSet = "UTF-8" ' or whatever
- be sure to set the Response.CharSet method to the desired character
set that the browser will see. By setting Response.CharSet, you will
cause the HTTP set content type header to be set to UTF-8. This will
tell the browser that the octets that are coming down from the server
are in UTF-8. Unfortunately, the CharSet property takes a string data
type, whereas the CodePage property takes a long. So you will have to
keep track of character sets both ways.
 
-Paul

        -----Original Message-----
        From: Magda Danish (Unicode)
        Sent: Fri 14/09/2001 15:06
        To: unicode@unicode.org
        Cc:
        Subject: FW: Unicode and the UTF8 encoding in HTML
        
        


        -----Original Message-----
        From: James Gardner [ mailto:james@robertsonlanguages.co.uk]
        Sent: Thursday, September 13, 2001 1:23 AM
        To: info@unicode.org
        Subject: Questions
        
        
        Dear Sir / Madam,
                I am currently quite confused about the link between
Unicode and
        the UTF8 character encoding in HTML. And also the link to how
files are
        saved.
                I have a Microsoft Active Server Page which is saved as
an ANSI
        file.In this file I specify that it should use UTF8 encoding.
The data
        (text) that is put into the page when it is created by the
server is
        stored as unicode. The problem is that when it reaches a browser
it
        seems to have been converted to Western European format, if I
change the
        encoding back to UTF it display funny characters. Is this
because the
        file is an ANSI file and WIndows is doing a conversion to the
most
        appropriate format behind my back? Do I need to save a file as
unicode
        as well as specifying utf8 encoding to properly display unicode
on the
        web?
                        Any help would be most appreciated!
                                        Regards,
                                                James Gardner
        
        
-----------------------------------------------------------------------
        This electronic communication is confidential and for the
exclusive use
        of the addressee. The information contained in this e-mail and
any
        attachments to it may also be privileged. If you are not the
intended
        addressee you are prohibited from any disclosure, distribution
or
        further copying or use of this communication or the information
in it.
        If you have received this communication in error please notify
us as
        soon as possible and delete the message from all places in your
computer
        where it is stored. Please note that we cannot guarantee the
security of
        electronic communications and you are advised to check any
attachments
        for viruses. We do not accept liability for any loss resulting
from any
        corruption or alteration of data or importation of any virus as
a result
        of receiving this electronic communication. Please note that any
views
        expressed in this electronic communication are those of the
author and
        may not necessarily be the opinions or views of Robertson
Languages
        International.
        
------------------------------------------------------------------------
        





This archive was generated by hypermail 2.1.2 : Sat Sep 15 2001 - 09:42:47 EDT