From: Jain, Pankaj (MED, TCS) (Pankaj.Jain@med.ge.com)
Date: Wed Mar 12 2003 - 12:23:48 EST
Hi ftang/james..
thanks for the details explanation. and now I the root problem of my
error.
I have following string is in database as Long in which the special
character(?) is equivalent to ndash(-)
E8C ? 6 to 10
And i am using following code to write the string from database to
property file, and in property file i am getting following string.
value= E8C \uFFE2\uFF80\uFF93 6 to 10
And as \uFFE2\uFF80\uFF93 is not equivalent to ndash, I am not able to
figure out why it is coming in property file.
Do we need to specify in my java program any type of encoding like
utf-8.
pls let me know where is the problem.
Here is my code..
while(rsResult.next())
{
/*Get the file contents from the value column*/
ipStream = rsResult.getBinaryStream("VALUE");
strBuf = new StringBuffer();
while((chunk = ipStream.read())!=-1)
{
byte byChunk = new Integer(chunk).byteValue();
strBuf.append((char) byChunk);
}
prop.setProperty(rsResult.getString("KEY"), strBuf.toString());
}
/*Write to o/p stream*/
//opFile = new FileOutputStream(strFileName+".properties");
opFile = new FileOutputStream(strFileName);
/*Store the Properties files*/
prop.store(opFile, "Resource Bundle created from Database View
"+vctView.get(i));
Thnaks
-Pankaj
-----Original Message-----
From: ftang@netscape.com [mailto:ftang@netscape.com]
Sent: Tuesday, March 11, 2003 6:09 PM
To: Jain, Pankaj (MED, TCS)
Cc: 'jameskass@att.net'; 'unicode@unicode.org'
Subject: Re: Unicode character transformation through XSLT
Because the following code got apply to your unicode data
1. convert \u to unicode -
\uFFE2\uFF80\uFF93
become
three unicode characters-
U+FFE2, U+FF80, U+FF93
This is ok
2. a "Throw away hihg 8 bits got apply to your code" so
it became 3 bytes
E2 80 93
3. and some code treat it as UTF-8 and try to convert it to UCS2 again,
so
E2 = 1110 0010 and the right most 4 bits 0010 will be used for UCS2
80 = 1000 0000 and the right most 6 bits 00 0000 will be used for UCS2
93 = 1001 0011 and the right most 6 bits 01 0011 will be used for UCS2
[0010] [00 0000] [01 0011] = 0010 0000 0001 0011 = 2013
U+2013 is EN DASH
so... in your code there are something very very bad which will corrupt
your data.
Step 2 and 3 are very bad. You probably need to find out where they are
and remove that code.
read my paper on
http://people.netscape.com/ftang/paper/textintegrity.html
<http://people.netscape.com/ftang/paper/textintegrity.html>
Probably your Java code have one or two bugs which listed in my paper.
Jain, Pankaj (MED, TCS) wrote:
James,
thanks, its working for me now.
But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving ndash in
html.
if you have any information on this, than pls let me know.
Thanks
-Pankaj
-----Original Message-----
From: jameskass@att.net <mailto:jameskass@att.net> [
mailto:jameskass@att.net <mailto:jameskass@att.net> ]
Sent: Monday, March 10, 2003 7:59 PM
To: Jain, Pankaj (MED, TCS)
Cc: ' unicode@unicode.org <mailto:unicode@unicode.org> '
Subject: Re: Unicode character transformation through XSLT
.
Pankaj Jain wrote,
My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93)
from resource bundle property file which is equivalent to ndash(-) and
its
U+2013 is the ndash (aEUR"). It is represented in UTF-8 by three
hex bytes: E2 80 93.
But, \uFFE2 is fullwidth pound sign
\uFF80 is half width katakana letter ta
and \uff93 is half width katakana letter mo.
Perhaps the reason you see three question marks is that the font
you are using doesn't support full width and half width characters.
What happens if you replace your string \uFFE2\uFF80\uFF93 with
\u2013 ?
Best regards,
James Kass
.
This archive was generated by hypermail 2.1.5 : Wed Mar 12 2003 - 13:10:52 EST