How do I create a pure UTF-8 web site? Specifically is there a way to
change the standard servlet class to use UTF-8 as the default char
encoding instead of ISO 8859-1?
I have looked at the source code for Jakarta Tomcat 3.2.2 and noticed
the statement:
public static final String DEFAULT_CHAR_ENCODING = "8859-1"; (in
constants.java)
The various classes such as HttpServletRequest and HttpServletResponse
use this constant when creating the default readers and writers and as a
consequence, the web site ends up being Latin-1.
I have experimented with adding the line:
request.setContentType("text/html; charset="UTF-8");
This change does correctly change the encoding of the request object to
UTF-8 and subsequent output gets sent to the browser in UTF-8. However
the response object incorrectly interprets response data because it is
decoding %XX octets as Latin-1 instead of UTF-8.
I know there is special code that I can write such as
String param = request.getParameter("parameter1");
byte[] rawVal = param.getBytes("UTF-8")
//create new string again.
param = new String(rawVal);
However I would prefer not to have to write special code to re-interpret
data after the fact. Also there are other standard classes which also
seem to assume iSO-8859-1 as the default character set (such as
URLDecoder an URLEncoder). Since internal data will always be Unicode,
I would prefer to set the default encoding to UTF-8 and be able to write
standard Java code.
Is there an easy way to override the default encoding at a low level so
that all the classes that use the default encoding will just work?
Thanks,
Paul Deuter
Plumtree Software
paul.deuter@plumtree.com
Paul Deuter
Internationalization Manager
Plumtree Software
paul.deuter@plumtree.com
This archive was generated by hypermail 2.1.2 : Tue Jul 17 2001 - 01:45:57 EDT