RE: multibyte char display

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Mon Mar 15 2004 - 10:35:44 EST

  • Next message: Stefan Persson: "Re: %"

    Hi Manga,

    There are two things that you need to check here.

    First, is your environment set up to display the non-ASCII characters?
    Solaris offers an impressive array of UTF-8 locales which should allow you
    to view Unicode data. You can switch to one of these by setting your LANG
    environment variable (although this may not solve font problems and other
    issues). Use the command 'locale -a' to list the available locales on your
    machine and look for one that looks like (for example) 'en_US.UTF-8'. [You
    may also be able to use a locale compatible with your data. An EUC-JP
    locale, for example, will display Japanese characters on the console.]

    Note that changing your locale on Unix isn't the whole solution. You may
    have to install fonts appropriate for the language/data (otherwise you'll
    see hollow boxes instead of question marks).

    Be sure to set LANG before running your Java program. For example:

       %LANG=en_US.utf8; java -cp ...

    The second issue you may encounter: is your data actually making it into the
    database? If your database is not configured to use a Unicode encoding (or
    at least a multibyte encoding compatible with your data), then the question
    marks are being created by the database when you store the data originally.

    How database encodings are configured and how you retrieve that information
    varies by database. I have a whitepaper on
    http://www.inter-locale.com/IUC19.pdf (which is rather stale, but has some
    useful information). You might check in your Java program to see if you are
    getting question marks in your Strings. This would indicate a problem with
    the database or (rarely) the JDBC driver configuration.

    Finally, you should check your code out. If you are just writing a little
    console app and your database is correctly configured, the problem may just
    be the locale and setup of your Solaris box as noted above. If you are
    having problems with text files, you should check out your use of
    OutputStreamWriter to ensure that you control the encoding it uses (and
    don't use the default system encoding, which is affected by your runtime
    locale). Writing out files as UTF-8 (instead of System.out.println()) will
    let you use the native2ascii utility or other programs to investigate the
    actual codepoints you are retrieving.

    Best Regards,

    Addison

    Addison P. Phillips
    Director, Globalization Architecture
    webMethods | Delivering Global Business Visibility
    http://www.webMethods.com
    Chair, W3C Internationalization (I18N) Working Group
    Chair, W3C-I18N-WG, Web Services Task Force
    http://www.w3.org/International

    Internationalization is an architecture.
    It is not a feature.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    > Behalf Of Manga
    > Sent: lundi 15 mars 2004 07:08
    > To: unicode@unicode.org
    > Subject: multibyte char display
    >
    >
    > I use UTF-8 encoding in java code to store multi byte characters in the
    > db . When i retreive the multi byte characters from db , i see
    > "?" inplace of the actual multi byte characters. I use solaris os.
    > Is there any environment variable which i can set to see the actual
    > characters on my terminal window.
    >
    > Thanks
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Mar 15 2004 - 11:25:08 EST