Re: Unicode in web pages

From: Vinod Balakrishnan (vinod@filemaker.com)
Date: Tue Sep 05 2000 - 14:09:42 EDT


Hi Stephen,

I had a similar problem last year when I was using an older version of
the JDK with a java tool.

This is my tip.

Enter some Japanese text in the form and check the values of the
characters in the each stage like, data from the browser (make sure you
are getting UTF8), data before sending it to the DB by weblogic and final
data stored in DB.

check where the data is getting converted between UTF-8/UTF-16

Find out when you query the data base whether you are getting the data in
UTf-8/UTF-16/Shift JIS. Finally if the data is stored in UTF8, check
whether the database/the quering application can display the Japanese
text stored in UTF-8.

-Vinod

>
>Does that mean that inputted code from a web-page must be changed from its
>UTF-8 encoding to UCS-2 for storage in SQL server? If so are there any
>converters out there?
>Can UCS-2 be used as the encoding for a web-page, or must conversion be done
>between the two encodings.
>
>>From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
>>To: "Unicode List" <unicode@unicode.org>
>>Subject: Re: Unicode in web pages
>>Date: Mon, 4 Sep 2000 10:48:29 -0800 (GMT-0800)
>>
>>Yep, a question mark is the character that Windows will replace any
>>character with that is not on the code page being used for conversion.
>>Since
>>you should be in UTF-2 for most of the time (both SQL Server and Java use
>>it, right?), it would be the conversion that was supposed to be happening
>>to
>>get it to UTF-8. Some other code page is being used, like the server
>>default?
>>
>>michka
>>
>>
>>----- Original Message -----
>>From: "Mark Davis" <markdavis@ispchannel.com>
>>To: "Unicode List" <unicode@unicode.org>
>>Cc: "Unicode List" <unicode@unicode.org>
>>Sent: Monday, September 04, 2000 10:32 AM
>>Subject: Re: Unicode in web pages
>>
>>
>> > Sounds like somewhere in the process bytes are getting interpreted as
>>the
>>wrong
>> > character set. For example, if you take a Unicode source, convert to
>>cp1252,
>> > then convert to UTF-8, you will get question marks on Windows or in Java
>>for the
>> > characters above FF, while the ones below (including some European ones)
>>will be
>> > correct UTF-8 characters.
>> >
>> > Mark
>> >
>> > BTW, there is a FAQ page on the Unicode site
>> > (http://www.unicode.org/unicode/faq/) about web pages. I am wondering
>>whether
>> > you looked at it, and if so whether you found it useful. Feedback would
>>help to
>> > improve those pages.
>> >
>> > Stephen Toner wrote:
>> >
>> > > The character is posted in a form, and the recieving page opens a
>>connection
>> > > to a SQL Server 7.0 database using the Weblogic JDBC:ODBC driver which
>> > > supports unicode. The java sting is then passed to the database.
>> > >
>> > > I have now found that the symbols in the database where indeed the
>>UTF-8
>> > > version of the characters eg = . This was for some European
>>characters
>> > > only.
>> > > However many characters in languages such as Japanese (and the Euro
>>symbol)
>> > > reach the database not in their correct form but with question marks
>>in
>> > > them. I don't know where the problem is occuring. How does the
>>character
>> > > get converted into these UTf-8 sequences, and could there be a problem
>>with
>> > > this - possibly it doesn't recognise the character that it should be
>> > > converting (Just a mad stab in the dark)
>> > >
>> > > Because UTF-8 is a sequence of bytes, does that mean that it could be
>> > > treated and stored as ASCII, and that the sequence would be recombined
>>to
>> > > unicode on output if the encoding was set to UTF-8?
>> > >
>> > > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
>> > > >To: "Unicode List" <unicode@unicode.org>
>> > > >Subject: Re: Unicode in web pages
>> > > >Date: Mon, 4 Sep 2000 05:04:08 -0800 (GMT-0800)
>> > > >
>> > > >Well, the client side is right if you are using UTF-8 and the browser
>>does
>> > > >indeed show UTF-8 as the encoding being used (how to check this
>>depends
>>on
>> > > >your browser -- View|Encoding or Edirt|Preferences), so there must
>>be
>>some
>> > > >issue on the server side.
>> > > >
>> > > >You may need to post more detail on the database, how you are getting
>>to
>> > > >it,
>> > > >etc. so someone who knows more about the server config can comment.
>> > > >
>> > > >michka
>> > > >
>> > > >
>> > > >----- Original Message -----
>> > > >From: "Stephen Toner" <toners5@hotmail.com>
>> > > >To: <michka@trigeminal.com>; <unicode@unicode.org>
>> > > >Sent: Monday, September 04, 2000 7:12 AM
>> > > >Subject: Re: Unicode in web pages
>> > > >
>> > > >
>> > > > > I am using JSP on the server side, and am using the TomCat server.
>> > > > >
>> > > > >
>> > > > > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
>> > > > > >Reply-To: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
>> > > > > >To: "Stephen Toner" <toners5@hotmail.com>, "Unicode List"
>> > > > > ><unicode@unicode.org>
>> > > > > >Subject: Re: Unicode in web pages
>> > > > > >Date: Mon, 4 Sep 2000 04:57:18 -0700
>> > > > > >
>> > > > > >UTF-8 is indeed the characterset you want to use for the page
>>encoding;
>> > > > > >although some browsers will support UTF-16, etc., not all will.
>> > > > > >
>> > > > > >But the real issue has to do with what technology you are using
>>to
>> > > >connect
>> > > > > >to the db. Is it ASP on the server side? Or something else? And
>>what is
>> > > >the
>> > > > > >server?
>> > > > > >
>> > > > > >michka
>> > > > > >
>> > > > > >
>> > > > > >----- Original Message -----
>> > > > > >From: "Stephen Toner" <toners5@hotmail.com>
>> > > > > >To: "Unicode List" <unicode@unicode.org>
>> > > > > >Sent: Monday, September 04, 2000 4:21 AM
>> > > > > >Subject: Unicode in web pages
>> > > > > >
>> > > > > >
>> > > > > > > Hi,
>> > > > > > > I'm fairly new to unicode and have a few problems trying to
>>input it
>> > > > > >from
>> > > > > >a
>> > > > > > > brower.
>> > > > > > > I need to take input from a web-page, and store it in a
>>database.
>> > > >Web
>> > > > > >pages
>> > > > > > > are then driven from this database. We want to use unicode to
>>allow
>> > > > > > > multi-lingual support. I was wondering if anyone could tell
>>me
>>of
>> > > >any
>> > > > > > > issues likely to be faced in this process.
>> > > > > > > Our database is capable of storing unicode, but I'm not sure
>>if
>>what
>> > > >is
>> > > > > > > reaching the database is actually unicode. Using IE 5.5, a
>>textarea
>> > > >in
>> > > >a
>> > > > > > > form is submitted containing any entered text. I have tried
>> > > >specifying
>> > > > > >the
>> > > > > > > page's character set as UTF-8. What then reaches the database
>>is a
>> > > > > >series
>> > > > > > > of ASCII values with foreign characters such as Japanese, or
>> > > >accented
>> > > > > > > characters, converted to a few symbols. I don't know if this
>>is
>> > > > > >unicode,
>> > > > > > > where when I look at it in the database the multi-byte
>>characters
>> > > >can
>> > > >be
>> > > > > > > seen as a combination of single byte (gibberish) characters.
>> > > > > > > If this isn't unicode do I need to put in some sort of
>>converter
>>to
>> > > > > >change
>> > > > > > > to &#xxxx; format? Some web sites seem to say that for html,
>> > > >unicode
>> > > > > >must
>> > > > > > > be changed to this numeric character reference format.
>> > > > > > > I would appreciate any advice.
>> > > > > > > Thanks in advance,
>> > > > > > > Stephen
>> > > > > > >
>> > > > >
>> > > >
>> >_________________________________________________________________________
>> > > > > > > Get Your Private, Free E-mail from MSN Hotmail at
>> > > > > >http://www.hotmail.com.
>> > > > > > >
>> > > > > > > Share information about yourself, create your own public
>>profile
>>at
>> > > > > > > http://profiles.msn.com.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > >
>> >_________________________________________________________________________
>> > > > > Get Your Private, Free E-mail from MSN Hotmail at
>> > > >http://www.hotmail.com.
>> > > > >
>> > > > > Share information about yourself, create your own public profile
>>at
>> > > > > http://profiles.msn.com.
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>>_________________________________________________________________________
>> > > Get Your Private, Free E-mail from MSN Hotmail at
>>http://www.hotmail.com.
>> > >
>> > > Share information about yourself, create your own public profile at
>> > > http://profiles.msn.com.
>> >
>> >
>>
>
>_________________________________________________________________________
>Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
>
>Share information about yourself, create your own public profile at
>http://profiles.msn.com.
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT