From: Leo Broukhis (leob@mailcom.com)
Date: Tue Jun 16 2009 - 17:04:14 CDT
Hello!
This may be not so much a (not "az") Unicode question as a general
computational math question:
Some time ago I wrote - in UTF-8 encoded Russian - to an old
acquaintance of mine.
His response was - still in Russian - denoted as iso-8859-1, and
contained an illegible combination of characters
that stumped all online Cyrillic decoders. These decoders usually try
several conversions among utf-8, iso-8859-1, koi8-r,
cp1251, cp866, mac cyrillic, and iso-8859-5, but they failed in my case.
Indeed, the set of byte values in the message did not fit into the the
value range occupied by the Cyrillic letters in any single encoding:
the word "Привет" ("Hello", U+041F 0440 0438 0432 0435 0442) became
"è•Ë’ÂÚ" ( E8 95 CB 92 C2 DA).
Luckily, he quoted my original message, and I was able to decrypt his
response by simple search-and-replace letter by letter, without
resorting to
letter frequency cryptanalytics.
What would be a way to find out what character set conversions were
applied to the text?
Thanks,
Leo
This archive was generated by hypermail 2.1.5 : Tue Jun 16 2009 - 17:07:05 CDT