From: tom.kirkpatrick@virusbtn.com
Date: Tue Apr 11 2006 - 17:38:36 CST
Mike,
thanks, that really helped clear a few thing up for me and with this
knowledge I have just found the source of (one of) my problems - which is
that the data I was trying to enter into the database had been saved in
ANSI format. I have resaved the sql script and reimported into my database
(MySQL) which appears to have cured the random '?' problem...
However, although that é (egrave) which was previously displaying as '?'
is now displaying correctly directly on my webpage, when I try to show it
in a web form element (a drop down menu), it now displays as a é ! So
somewhere along the lines it must be being converted back to ISO 8859-1
right? My web browser knows that the page is in Unicode. I think there is
a possibly the the code that is being used to generate these form elements
may be doing this. If it's not the form generation code, then... well it
must be, as this is the only thing that is different from displaying
normally on the page.
> The storage fields are of type N* (e.g. NVARCHAR), correct?
It's a MySQL database (although an old one - v 4.0.26) and the storage
fields are of type VARCHAR. As far as I know this version of MySQL doesn't
have much support for encodings, and I'm not sure what encoding it is
currently set to, but I assume that if I enter data as UTF-8 then it will
be stored as UFT-8 right? If not, then I need to set the database to store
things as Unicode somehow right?
Mike Ayers <mayers@celequest.com>
Sent by: unicode-bounce@unicode.org
11/04/2006 23:12
To
tom.kirkpatrick@virusbtn.com
cc
unicode@unicode.org
Subject
Re: How do I type unicode characters?
tom.kirkpatrick@virusbtn.com wrote:
> Which one of these looks like a proper UTF-8 character: é or é ?
Neither. There is no such thing as a "UTF-8 character",
just "UTF-8
encoded Unicode data". In most cases I would be nitpicking to point
this out, but in this case I think it is the cause of your problem:
Characters: é é
Unicode code points: 233 195 169
Unicode hex points: E9 C3 A9
It is interesting to note that C3 A9 is the UTF-8
encoding of E9.
> Basically, if I enter the character 'é' (egrave) into my database, when
> trying to display it on a webpage, it displays as a '?'. If I try to
enter
> it as 'é' It displays ok. So does this mean that the correct way to
type
> an 'é' is to actually type 'é'?
No. It means that you should not handle text as binary.
What you are
doing is entering ISO 8859-1 characters (bytes) from one end, then
interpreting the same stream as UTF-8 encoded Unicode at the other,
which is why you have to enter gobbldeygook in order to get the result
you desire.
My guess is that your database is in ISO 8859-1 format,
and your web
page declares UTF-8 (there are many ways to get this particular error,
so I guess). What you need to do is verify that your data is being
extracted from the database as UTF-8 data. The storage fields are of
type N* (e.g. NVARCHAR), correct?
HTH,
/|/|ike
-- Tom Kirkpatrick Web Developer - Virus Bulletin
This archive was generated by hypermail 2.1.5 : Tue Apr 11 2006 - 17:40:56 CST