From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Fri Jun 05 2009 - 05:11:15 CDT
Damon Anderson schrieb:
> I'm not sure what you mean by "quoted-printable encoding".
Cf. <http://en.wikipedia.org/wiki/Quoted-printable>;
for an example,
cf. <http://de.wikipedia.org/wiki/Quoted-printable#Beispiel>.
> But it seems
> in both cases, either Notepad or Email that I am choosing which encoding
> to save/send the file in... with the result being that it is possible
> the application is converting the original content created by Unikey.
I do not expect neither Notpad nor Thunderbird to apply normalization
(cf. <http://www.unicode.org/faq/normalization.html>, and
<http://www.unicode.org/reports/tr15/>) to the data. I rather guess
that they simply apply the desired encoding to the data.
In Notepad, you would chose "Unicode Big Endian" (i. e. UTF-16BE)
encoding to store the unaltered data, as delivered from the keyboard
driver.
In Thunderbird, you would chose "Unicode (UTF-8)" encoding, and
"quoted-printable". I am using the German Thunderbird version, so
I can only guess the menu items you will have to use:
the encoding is under settings/encoding; the quoted printable under
options/settings/general (or something similar). With these settings,
Thunderbird will convert the text into UTF-8, and then apply the
MIME quoted-printable encoding.
> By the way when I get my Hex dump how do I match that to the Unicode chart?
The Unicode charts exhibit the Unicode Scalar Values in hex;
any hex dump of UTF-16BE data will directly compare to the charts.
To compare UTF-8 data to the charts, you would have to reverse
the UTF-8 encoding first; cf.
<http://www.systems.uni-konstanz.de/Otto/Vortrag/Charset/UTF-8_Magic_Pocket_Encoder.pdf>,
or <http://skew.org/cumped/>.
In quoted-printable, you get the hex value of each non-ASCII byte
in three characters, e. g. "=FC" for the byte FC (hex).
Good luck,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Fri Jun 05 2009 - 05:15:05 CDT