Hi Jack:
I have been hacking away at Java inadequacies to correctly recognize small
endian text as put forward by NT. I guess I have a software solution. Now,
I need text data to test.
Thanks for all your help so far. I will look at the pages you have
recommended and will keep you updated.
The Java converters are very picky in what text you provide. But a simple
"flipper" of bytes worked for me. :)
Mustafa
On Sun, 8 Mar 1998, Jake Morrison wrote:
> Mustafa,
>
> An excellent source is the pages for the 10th International Unicode
> Conference:
> http://www.unicode.org/unicode/iuc10/languages.html
>
> It also has the data in Unicode, so you can check your work.
>
> Another option is to surf the home pages for the major Asian companies.
>
> If you want lots of random text (sometimes very random :-), you can get
> messages from Usenet news.
>
> The tw.* hierarchy is from Taiwan
> The hk.* hierarchy is from Hong Kong
> The fj.* hierarchy is from Japan
> The han.* hierarchy is from Korea
>
> Regards,
> Jake
>
> On Sun, 8 Mar 1998, Mustafa Hasham wrote:
>
> >
> > Hi:
> >
> > As part of a project in a CS class, I intend to convert CJK encoded text
> > files into Unicode. I am using Windows NT and program in Java. Does anyone
> > out there know of any sample text files I can use? Any encoding scheme
> > would be fine... Big5, Kanji, GB, etc.. I do not have access to an input
> > editor.
> >
> > Thanks
> >
> > Mustafa
> >
> >
>
>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT