Re: Unicode character transformation through XSLT

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Mar 14 2003 - 12:40:40 EST

  • Next message: Mark Davis: "Re: New document."

    Nooo - Java's old "UTF" functions do not process UTF-8! They are there for String serialization, a
    Java-internal format.
    Use the Java Reader/Writer classes instead of these old ones!

    See the Java tutorials on Internationalization:
    http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html
    http://java.sun.com/docs/books/tutorial/i18n/text/index.html
    http://java.sun.com/docs/books/tutorial/i18n/index.html

    See the descriptions of readUTF() functions (highlighting with ***):

    http://java.sun.com/j2se/1.4/docs/api/java/io/DataInputStream.html#readUTF(java.io.DataInput)

    "Reads from the stream in a representation of a Unicode character string encoded in ***Java modified
    UTF-8*** format; this string of characters is then returned as a String. The details of the
    ***modified UTF-8*** representation are exactly the same as for the readUTF method of DataInput."

    http://java.sun.com/j2se/1.4/docs/api/java/io/DataInput.html#readUTF()

    Java's *modified* UTF-8 in its "UTF" functions resembles CESU-8, and writes U+0000 with two bytes
    instead of one, as far as I remember.

    markus

    Yung-Fong Tang wrote:
    > what is rsResult? Blob?
    > you probably need to use
    >
    > BufferedInputStream
    >
    > and
    >
    > DataInputStream
    >
    > to pipe the InputStream
    > and use readChar or readUTF in the InputStream interface instad.
    > See http://www.webdeveloper.com/java/java_jj_read_write.html and
    > http://java.sun.com/j2se/1.4/docs/api/java/io/DataInputStream.html#readUTF()
    > for more info.

    -- 
    Opinions expressed here may not reflect my company's positions unless otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Fri Mar 14 2003 - 13:35:55 EST