[Fwd: [question] UTF-8 issue--update]

From: Chat S. Depasucat (cdepasucat@ntsp.nec.co.jp)
Date: Thu Oct 08 2009 - 05:56:49 CDT

Next message: Marion Gunn: "Re: [Unicode Announcement] Unicode Haiku Contest"

Previous message: Rick McGowan: "Re: Versioned charts disappeard?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

update:

the java program used InputStreamReader in reading the xml file, and uses
StreamDecoder.

Is this safe enough not to generate "non-shortest form"?
do i have nothing to worry about.?

thanks a lot.

attached mail follows:

im really thankful that i get to find this mailing list.

i have few UTF-8 issues that I wish somebody could give light on:

I understand that in UTF-8 encoding, Unicode characters can be
represented in more than one way.
Like for the US ASCII characters, it can be represented as "shortest
form" and "non-shortest form".
With these issues, java1.6.0_11 changed the UTF-8 charset implementation
to disregard the "non-shortest form".

Here are my questions:
1. How does UTF-8 identify that a byte sequence is illegal? That the
sequence is in the non-shortest form?
2. Who/how does "non-shortest form" be encoded?
    For xml files for example, who transforms these characters into
bytes which in turn could turn into "shortest" or "non-shortest"?

   When a program reads from an xml file with UTF-8 encoding, is it
possible that the byte decoded is in "non-shortest form?"

hope somebody could help me understand this.
thanks so much

Next message: Marion Gunn: "Re: [Unicode Announcement] Unicode Haiku Contest"
Previous message: Rick McGowan: "Re: Versioned charts disappeard?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 08 2009 - 10:43:07 CDT