RE: How to create an all UTF-8 Web site using Java (JSP)

From: Paul Deuter (
Date: Tue Jul 17 2001 - 13:51:25 EDT

Sorry ... there is a mistake in my email.
I am setting content type on the response object not the request object.
response.setContentType("text/html; charset=UTF-8");

I keep on discovering more and more classes which have this insidious
assumption of 8859-1. If anyone knows how to override this default
encoding in a global fashion that would be great to know. Apparently
there is a command line parameter for Websphere:

but this does not appear to be universal for all J2EE application

Thanks in advance,

Plumtree Software

-----Original Message-----
From: Paul Deuter
Sent: Monday, July 16, 2001 10:00 PM
To: Unicode List (E-mail)
Subject: How to create an all UTF-8 Web site using Java (JSP)

How do I create a pure UTF-8 web site? Specifically is there a way to
change the standard servlet class to use UTF-8 as the default char
encoding instead of ISO 8859-1?

I have looked at the source code for Jakarta Tomcat 3.2.2 and noticed
the statement:

public static final String DEFAULT_CHAR_ENCODING = "8859-1"; (in

The various classes such as HttpServletRequest and HttpServletResponse
use this constant when creating the default readers and writers and as a
consequence, the web site ends up being Latin-1.

I have experimented with adding the line:

request.setContentType("text/html; charset="UTF-8");

This change does correctly change the encoding of the request object to
UTF-8 and subsequent output gets sent to the browser in UTF-8. However
the response object incorrectly interprets response data because it is
decoding %XX octets as Latin-1 instead of UTF-8.

I know there is special code that I can write such as
String param = request.getParameter("parameter1");
byte[] rawVal = param.getBytes("UTF-8")
//create new string again.
param = new String(rawVal);

However I would prefer not to have to write special code to re-interpret
data after the fact. Also there are other standard classes which also
seem to assume iSO-8859-1 as the default character set (such as
URLDecoder an URLEncoder). Since internal data will always be Unicode,
I would prefer to set the default encoding to UTF-8 and be able to write
standard Java code.

Is there an easy way to override the default encoding at a low level so
that all the classes that use the default encoding will just work?

Paul Deuter
Plumtree Software

Paul Deuter
Internationalization Manager
Plumtree Software

This archive was generated by hypermail 2.1.2 : Tue Jul 17 2001 - 14:48:25 EDT