Unicode and mySQL

From: Tom Emerson (Tree@basistech.com)
Date: Wed Apr 05 2000 - 14:31:01 EDT

Hello Liz,

I saw your message sent to the Unicode Consortium regarding the use of
Unicode in mySQL, serving on the backend of a multilingual website, and
wanted to share my experience with you.

I've been using mySQL to store (Simplified and Traditional) Chinese and
Japanese lexicographic data with great success using UTF-8. The mySQL server
was compiled with Latin-1 as its character set, and all columns containing
UTF-8 data are declared BINARY. You lose two ways when doing this, however:

1) ORDER BY operations will not do what you expect on fields containing
UTF-8 data. For my case I do all of my sorting either on romanized readings
(pinyin) or on a numeric ID.

2) Some character based operations (such as using LIKE patterns with "%" or
"_") will not behave as expected in UTF-8 fields, depending on the data they

If you can live with these two restrictions, then mySQL will work fine for

You also have no support from mySQL for transcoding between various legacy
encodings and UTF-8, so this needs to be done by hand. However, I've used
Python and Perl to great effect here.

I am currently adding support for Unicode to mySQL: announcements will be
made to this list and to the mySQL discussion list when I have something

I make use of PHP3 (as a DSO in Apache 1.3.12) for queries and other
database operations Make sure that your HTML forms specify UTF-8 as their
character encoding. Using IE5 I'm able to enter Simplified and Traditional
Chinese queries and display the results.

All of this is (besides the IE5 part ;-) is done on various versions of
RedHat Linux.

Please feel free to contact me if you have any questions,


Tom Emerson                                          Basis Technology Corp.
Language Hacker                                    http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT