Re: posting of unicode data to servlet

From: addison@inter-locale.com
Date: Wed Nov 29 2000 - 14:05:32 EST


Try HttpServletRequest.setContentType("text/html; charset=UTF-8");

The page directive is the JSP tag for doing this.

Addison

===========================================================
Addison P. Phillips Principal Consultant
Inter-Locale LLC http://www.inter-locale.com
Los Gatos, CA, USA mailto:addison@inter-locale.com

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
===========================================================
Globalization Engineering & Consulting Services

On Wed, 29 Nov 2000, Bhalchandra Patil wrote:

> thanks addison,
>
> but i am still not clear.
> I have rectified the mistake in the META tag.
>
> I could not find any equivalent command for the > <%@ page
> contentType="text/html; charset=UTF-8" %> in servlet apis.
>
> but, i have changed the default character set of the webserver to utf-8.
>
> Now i am entering only one chinese character in the Textfield (with name say
> "uniname") in html page, whose unicode value is 20840 decimal or 0x5158 hex.
> When i submit the page to the server, the request goes to servlet and when i
> say String str = request.getParameter("uniname"), it should give me a string
> with length 1 ( and (long)str.charAt(0) should give me 20840 ).
> [ Rather i want such string ]
>
> Is it the right format of the unicode string what i am expecting?
>
> Instead, it gives me a string with two characters with ascii values 145 and
> 83.
>
> Is there any fundamental mistake i am doing or its something to do with
> webserver's handling of posted unicode data?
>
>
> regards,
> bhala
>
> ----- Original Message -----
> From: <addison@inter-locale.com>
> To: Bhalchandra Patil <bpatil@mahindrabt.com>
> Cc: Unicode List <unicode@unicode.org>
> Sent: Wednesday, November 29, 2000 10:42 PM
> Subject: Re: posting of unicode data to servlet
>
>
> > Hi Bhala,
> >
> > When you use request.getParameter( ) the request class converts the data
> > POSTed to a Java String object. This includes converting the data from
> > whatever the servlet *thinks* the page is encoded as to Java's internal
> > representation, which is UCS-2 (i.e. Unicode).
> >
> > It is important to tell the servlet what the encoding of the page is,
> > therefore. Just putting a META tag into the page won't do it. In a JSP
> > page, for example, you can declare:
> >
> > <%@ page contentType="text/html; charset=UTF-8" %>
> >
> > Note that your META tag has a typo in it. There should not be a
> > double-quote after the charset=.
> >
> > You should be aware that you can generate the page in any valid character
> > set and weblogic's servlet engine will convert the results to Unicode for
> > you. For example, you might choose to use the Big5 character encoding for
> > a Traditional Chinese page. The page directive will result in data POSTed
> > to you being converted to a Java String (and thus Unicode).
> >
> > If you want to get access to the specific *characters* in the String you
> > can use the various methods for accessing chars and char arrays in the
> > String class in conjunction with the Character class to access all kinds
> > of useful information about specific characters. Using getBytes() the way
> > you've described will result in converting the characters to a byte
> > oriented encoding, such as UTF-8, which is not really what you want to do
> > in this case.
> >
> > Best Regards,
> >
> > Addison
> >
> > ===========================================================
> > Addison P. Phillips Principal Consultant
> > Inter-Locale LLC http://www.inter-locale.com
> > Los Gatos, CA, USA mailto:addison@inter-locale.com
> >
> > +1 408.210.3569 (mobile) +1 408.904.4762 (fax)
> > ===========================================================
> > Globalization Engineering & Consulting Services
> >
> > On Wed, 29 Nov 2000, Bhalchandra Patil wrote:
> >
> > > Hi,
> > >
> > > i am running an servlet on weblogic ( jre 1.2). The html page should
> accept
> > > input in any character set say chinese. That value is posted to the
> servlet.
> > > I want to retrieve the unicode value of the character in the servlet.
> > >
> > > In the html page, i have specified meta tag
> > > <META HTTP_EQUIV="Content-Type" content="text-html; charset="UTF-8">
> > >
> > > in servlet, i am using String str = request.getParameter("name")
> > > str.getBytes("UTF8") does not work.
> > >
> > > What should i do to get the unicode values of the characters entered.
> > >
> > > Please help!!!!!
> > >
> > > regards,
> > > bhala
> > >
> > >
> > >
> > >
> >
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT