Re: Unicode in web pages

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Sep 04 2000 - 14:56:39 EDT


Yep, a question mark is the character that Windows will replace any
character with that is not on the code page being used for conversion. Since
you should be in UTF-2 for most of the time (both SQL Server and Java use
it, right?), it would be the conversion that was supposed to be happening to
get it to UTF-8. Some other code page is being used, like the server
default?

michka

----- Original Message -----
From: "Mark Davis" <markdavis@ispchannel.com>
To: "Unicode List" <unicode@unicode.org>
Cc: "Unicode List" <unicode@unicode.org>
Sent: Monday, September 04, 2000 10:32 AM
Subject: Re: Unicode in web pages

> Sounds like somewhere in the process bytes are getting interpreted as the
wrong
> character set. For example, if you take a Unicode source, convert to
cp1252,
> then convert to UTF-8, you will get question marks on Windows or in Java
for the
> characters above FF, while the ones below (including some European ones)
will be
> correct UTF-8 characters.
>
> Mark
>
> BTW, there is a FAQ page on the Unicode site
> (http://www.unicode.org/unicode/faq/) about web pages. I am wondering
whether
> you looked at it, and if so whether you found it useful. Feedback would
help to
> improve those pages.
>
> Stephen Toner wrote:
>
> > The character is posted in a form, and the recieving page opens a
connection
> > to a SQL Server 7.0 database using the Weblogic JDBC:ODBC driver which
> > supports unicode. The java sting is then passed to the database.
> >
> > I have now found that the symbols in the database where indeed the UTF-8
> > version of the characters eg = . This was for some European characters
> > only.
> > However many characters in languages such as Japanese (and the Euro
symbol)
> > reach the database not in their correct form but with question marks in
> > them. I don't know where the problem is occuring. How does the
character
> > get converted into these UTf-8 sequences, and could there be a problem
with
> > this - possibly it doesn't recognise the character that it should be
> > converting (Just a mad stab in the dark)
> >
> > Because UTF-8 is a sequence of bytes, does that mean that it could be
> > treated and stored as ASCII, and that the sequence would be recombined
to
> > unicode on output if the encoding was set to UTF-8?
> >
> > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > >To: "Unicode List" <unicode@unicode.org>
> > >Subject: Re: Unicode in web pages
> > >Date: Mon, 4 Sep 2000 05:04:08 -0800 (GMT-0800)
> > >
> > >Well, the client side is right if you are using UTF-8 and the browser
does
> > >indeed show UTF-8 as the encoding being used (how to check this depends
on
> > >your browser -- View|Encoding or Edirt|Preferences), so there must be
some
> > >issue on the server side.
> > >
> > >You may need to post more detail on the database, how you are getting
to
> > >it,
> > >etc. so someone who knows more about the server config can comment.
> > >
> > >michka
> > >
> > >
> > >----- Original Message -----
> > >From: "Stephen Toner" <toners5@hotmail.com>
> > >To: <michka@trigeminal.com>; <unicode@unicode.org>
> > >Sent: Monday, September 04, 2000 7:12 AM
> > >Subject: Re: Unicode in web pages
> > >
> > >
> > > > I am using JSP on the server side, and am using the TomCat server.
> > > >
> > > >
> > > > >From: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > > >Reply-To: "Michael \(michka\) Kaplan" <michka@trigeminal.com>
> > > > >To: "Stephen Toner" <toners5@hotmail.com>, "Unicode List"
> > > > ><unicode@unicode.org>
> > > > >Subject: Re: Unicode in web pages
> > > > >Date: Mon, 4 Sep 2000 04:57:18 -0700
> > > > >
> > > > >UTF-8 is indeed the characterset you want to use for the page
encoding;
> > > > >although some browsers will support UTF-16, etc., not all will.
> > > > >
> > > > >But the real issue has to do with what technology you are using to
> > >connect
> > > > >to the db. Is it ASP on the server side? Or something else? And
what is
> > >the
> > > > >server?
> > > > >
> > > > >michka
> > > > >
> > > > >
> > > > >----- Original Message -----
> > > > >From: "Stephen Toner" <toners5@hotmail.com>
> > > > >To: "Unicode List" <unicode@unicode.org>
> > > > >Sent: Monday, September 04, 2000 4:21 AM
> > > > >Subject: Unicode in web pages
> > > > >
> > > > >
> > > > > > Hi,
> > > > > > I'm fairly new to unicode and have a few problems trying to
input it
> > > > >from
> > > > >a
> > > > > > brower.
> > > > > > I need to take input from a web-page, and store it in a
database.
> > >Web
> > > > >pages
> > > > > > are then driven from this database. We want to use unicode to
allow
> > > > > > multi-lingual support. I was wondering if anyone could tell me
of
> > >any
> > > > > > issues likely to be faced in this process.
> > > > > > Our database is capable of storing unicode, but I'm not sure if
what
> > >is
> > > > > > reaching the database is actually unicode. Using IE 5.5, a
textarea
> > >in
> > >a
> > > > > > form is submitted containing any entered text. I have tried
> > >specifying
> > > > >the
> > > > > > page's character set as UTF-8. What then reaches the database
is a
> > > > >series
> > > > > > of ASCII values with foreign characters such as Japanese, or
> > >accented
> > > > > > characters, converted to a few symbols. I don't know if this is
> > > > >unicode,
> > > > > > where when I look at it in the database the multi-byte
characters
> > >can
> > >be
> > > > > > seen as a combination of single byte (gibberish) characters.
> > > > > > If this isn't unicode do I need to put in some sort of converter
to
> > > > >change
> > > > > > to &#xxxx; format? Some web sites seem to say that for html,
> > >unicode
> > > > >must
> > > > > > be changed to this numeric character reference format.
> > > > > > I would appreciate any advice.
> > > > > > Thanks in advance,
> > > > > > Stephen
> > > > > >
> > > >
> > >
>_________________________________________________________________________
> > > > > > Get Your Private, Free E-mail from MSN Hotmail at
> > > > >http://www.hotmail.com.
> > > > > >
> > > > > > Share information about yourself, create your own public profile
at
> > > > > > http://profiles.msn.com.
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> >
>_________________________________________________________________________
> > > > Get Your Private, Free E-mail from MSN Hotmail at
> > >http://www.hotmail.com.
> > > >
> > > > Share information about yourself, create your own public profile at
> > > > http://profiles.msn.com.
> > > >
> > > >
> > >
> >
> >
_________________________________________________________________________
> > Get Your Private, Free E-mail from MSN Hotmail at
http://www.hotmail.com.
> >
> > Share information about yourself, create your own public profile at
> > http://profiles.msn.com.
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT