From: Mike Ayers (mayers@celequest.com)
Date: Tue Apr 11 2006 - 16:12:47 CST
tom.kirkpatrick@virusbtn.com wrote:
> Which one of these looks like a proper UTF-8 character: é or é ?
Neither. There is no such thing as a "UTF-8 character", just "UTF-8
encoded Unicode data". In most cases I would be nitpicking to point
this out, but in this case I think it is the cause of your problem:
Characters: é é
Unicode code points: 233 195 169
Unicode hex points: E9 C3 A9
It is interesting to note that C3 A9 is the UTF-8 encoding of E9.
> Basically, if I enter the character 'é' (egrave) into my database, when
> trying to display it on a webpage, it displays as a '?'. If I try to enter
> it as 'é' It displays ok. So does this mean that the correct way to type
> an 'é' is to actually type 'é'?
No. It means that you should not handle text as binary. What you are
doing is entering ISO 8859-1 characters (bytes) from one end, then
interpreting the same stream as UTF-8 encoded Unicode at the other,
which is why you have to enter gobbldeygook in order to get the result
you desire.
My guess is that your database is in ISO 8859-1 format, and your web
page declares UTF-8 (there are many ways to get this particular error,
so I guess). What you need to do is verify that your data is being
extracted from the database as UTF-8 data. The storage fields are of
type N* (e.g. NVARCHAR), correct?
HTH,
/|/|ike
This archive was generated by hypermail 2.1.5 : Tue Apr 11 2006 - 16:23:06 CST