Re: Java, SQL, Unicode and Databases

From: Tex Texin (texin@progress.com)
Date: Fri Jun 23 2000 - 04:00:49 EDT


Addison, thanks for this. Good points.
I am sure if we bear down on it, there can be many more than 2
problems. JDBC driver differences will be a third.
We went thru similar issues programming for
double-byte databases a few years back. At least with
Unicode, we are doing this for the last time. ;-)

What prompted the question was some allegations about
constraints on datatypes and resulting reprogramming that
would be required. I am not sure I believe them so
I don't want to repeat them. I was glad to see the comments
on Oracle and Sybase, I hope we will hear from some others.

Ken is right about UTF-16 being murkier, but not too many
databases are there yet.

Tex

Addison Phillips wrote:
>
> I dunno, Tex, sounds like two problems to me.
>
> 1. How do I configure all these different databases to support Unicode
> (transparently to my app)?
> 2. How do I write my program to store/retrieve Unicode (independent of
> database)?
>
> When generating SQL statements, there are relatively few differences in the
> actual statements. It's the database schema that has to be modified to take
> advantage of Unicode in most instances that I've dealt with (since your SQL
> statement will be generated from class String and won't specify a datatype
> explictly-- the JDBC driver "knows" if the column is fixed width and is
> supposed to handle trimming or blank padding for you). Otherwise, a SQL
> statement is basically a big string, and doens't specify the explicit datatype.
>
> In fact, the JDBC 1.2 spec says "There is no need for Java programmers to
> distinguish among the three different flavors of SQL strings CHAR, VARCHAR, and
> LONGVARCHAR. These can all be expressed identically in Java. It is possible to
> read and write SQL correctly without needing to know the exact data type that
> was expected..."
>
> Of course, if your program is going to generate tables, you will have to know
> the specifics of the schema, and this varies by manufacturer.
>
> There are other little configuration tweaks you may have to master, also
> (again, not in your Java code directly). With Oracle, you do have to set the
> NLS_LANG parameter appropriately to get the JDBC driver to generate UTF-8 SQL
> statements (and receive Unicode back from the database). In addition, you have
> to modify (better, create) your Oracle instance to use UTF-8 as the database
> character set.
>
> The use of nchar types is database dependent. Oracle doesn't require/use nchar
> types to store Unicode. Some other databases do. The Transact-SQL 7.x
> documentation maintains that MS SQL Server still requires nchar/nvarchar
> datatypes to store Unicode data. I haven't fooled much with MS-SQL in awhile,
> so it could be true (but it sounds dubious, doesn't it?). In any case, it
> shouldn't make a difference in your Java code. It'll be at database
> configuration time that you have to decide.
>
> As usual, the "real" problem with storing Unicode may not be with the character
> set anyway. It's with things like collation sequence (Oracle, for example,
> allows only a single locale collation sequence at one time), normalization (most
> databases don't), and data expansion (does varchar 50 mean 50 characters or 50
> bytes? if it's 50 bytes, how many are enough for *your* data at worst case
> expansion? is worst case expansion still 3 bytes per character given the Outer
> Planes? will your customer accept that?). Your Java code will have to make up
> for the idiosyncrasies of your database with regard to these (locale-related)
> factors.
>
> Best Regards,
>
> Addison
>
> Addison P. Phillips
> Principal Consultant
> Inter-Locale, LLC
> Globalization Engineering & Consulting Services
>
> +1 408.210.3569 (mobile) +1 408.904.4762 (fax)
> mailto: addison@inter-locale.com http://www.inter-locale.com

-- 
------------------------------------------------------------------------------------------------
Tex Texin                     Director, International Products
                                 
Progress Software Corp.       +1-781-280-4271
14 Oak Park                   +1-781-280-4655 (Fax)
Bedford, MA 01730  USA        texin@bedford.progress.com

http://www.progress.com The #1 Embedded Database http://www.SonicMQ.com JMS Compliant Messaging- Best Middleware Award http://www.aspconnections.com Leading provider in the ASP marketplace

Progress Globalization Program (New URL) http://www.progress.com/partners/globalization.htm ------------------------------------------------------------------------------------------------ Come to the Panel on Open Source Approaches to Unicode Libraries at the Sept. Unicode Conference http://www.unicode.org/iuc/iuc17



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT