I've made a quick-dirty-hack utility that converts from all popular
encodings of Japanese to Unicode. It's basically a patchwork of the
encoding files on Unicode's server-- no brilliance in it, but I couldn't
find a similar command line tool out there that's robust for Unicode
(native2ascii that comes with JDK 1.1 is geared towards Java source
files and spooks out often (lots of crashing out with IO exceptions).
It's easy to use because it's plain simple, and auto-detects popular
Japanese formats.
While I've tested it on fairly large real life files, I'm looking for
people with an ANSI C compiler (and preferably programming knowledge)
and some spare time to test it for me.
What I'm looking for in the testing is:
1) how well it handles damaged files (i.e. does it convert damaged files
to Unicode in a way it's supposed to?)
2) Performance (shouldn't be TOO bad... O(1)... but I know it could be
better because I did some pretty quick, dirty unelegant hacks for
handling various EOL and EOF situations)
3) Auto-detection ability. Are there certain sequences it should be able
to auto-detect but fails?
4) cases were the program isn't portable depending on the compiler.
5) incorrect terminology/facts for the Unicode references in the docs.
The program is called jisuni ("for JIS to Unicode") and you can get the
sources and docs at <URL:http://www.threeweb.ad.jp/~havill/jisuni.zip>
(for the ZIP folks) or
<URL:http://www.threeweb.ad.jp/~havill/jisuni.tar.Z> (for the Un*x
folks).
This is alpha software (and only one part of eight)-- please don't
download it and use it for important stuff yet. When I finish it, I'll
let folks know. I'm still testing the Chinese and Korean tools, the the
character composer-decomposer for the Unicode-to-series. (And I haven't
finished the Japanese translation of the docs yet).
One more question: what other free Unicode conversion tools are out
there? Not a lot of resources about that on the Unicode's homepage... :(
I expect to have the KSC, GB, and Big5 converters finished in two weeks.
(It's not the code... the code is easy... it's the translation of the
docs). By the way, out company has native Chinese speakers so that's not
a problem, but no native Korean speakers. Anyone want to help me with
the Korean translation of the docs?
-- Adrian Havill <URL:http://www.threeweb.ad.jp/> Engineering Division, System Planning & Production Section
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT