Re: UTF-8

From: addison@inter-locale.com
Date: Tue Sep 19 2000 - 11:36:49 EDT


Hi Stephen,

Java's internal encoding is UTF-16. Every String is encoded as
UTF-16. Since no web pages are generated in that encoding, JSP provides a
basic mechanism for setting up a character set converter (essentially an
InputStreamReader and an OutputStreamReader).

The default page encoding for JSP is ISO-8859-1. The processing page will
hand you UTF-8 instead of 8859-1 if you use the <%@ page
contentType="text/html; utf-8" %> directive in your page.

If you wish to receive a UTF-8 "POST" or "GET" in an 8859-1 page, you will
need to setup the InputStreamReader to convert the characters yourself. I
know I'm being sketchy here, but I'm running late this morning. Let me
know if the contentType directive doesn't fix your problem.

Thanks,

Addison

On Tue, 19 Sep 2000, Stephen Toner wrote:

> Hi,
> I am still having trouble with inputted UTF-8 from a browser. The problem is that my database can't store UTF-8 but only UTF-16. I have tried to convert between the two with little success. The trouble is that the inputted string is obtained from the request object using String temp=request.getParameter("TheText");
> This leaves me with a string which I think(Please correct me if I'm wrong) is correctly encoded in UTF-8 (For example a japanese character was converted to a 3-byte sequence.- ,) However the String API only allows me to convert a byte array containing non-Unicode text to Unicode or you can convert a String object into a byte array of non-Unicode characters. But what I have is a string of non-Unicode characters which I must convert to Unicode characters. I tried converting it to bytes, which without specifying the encoding left 2 question marks in, and with specifying the encoding as UTF-8 just converted each character to UTF-16 giving 6 bytes instead of the 2 bytes that I wanted. If I was able to somehow get the byte values for each character I would be flying, but unfortunately a load of different characters get converted to 3F- the code for a question mark.
> Does anyone know of any way of converting directly in Java?
> Also when I submit a form page with the encoding specified as UTF-8 what actually does the converting from what is in the form to UTF-8?
> Thanks for any help,
> Stephen
>

===========================================================
Addison P. Phillips Principal Consultant
Inter-Locale LLC http://www.inter-locale.com
Los Gatos, CA, USA mailto:addison@inter-locale.com

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
===========================================================
Globalization Engineering & Consulting Services



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT