Am 1999-07-25 um 22:27 h hat Viranga Ratnaike geschrieben:
> Is there any freely available data encoded as either UTF-8 or UTF-16.
<http://titus.uni-frankfurt.de/unicode/unitest.htm#samples>
<http://pantheon.yale.edu/~jshin/faq/utf8_kr.html>
Am 1999-07-25 um 22:27 h hat Viranga Ratnaike geschrieben:
> But this is inconvenient as it's embedded in html.
You can try to get rid of the HTML tags, in two ways:
- Via cut-and-paste, e.g. mark the text in your browser, copy it,
then paste it into your word processor, then store it as UTF-8,
or UTF-16, plain text. I have tried this with Netscape Com-
municator 4.05 and MS-Word 97, and it essentially works.
- Store a copy of the HTML file and open it in a HTML aware word
processor, exploting the HTML input conversion; then store it
as a Unicode plain text file. I have tried this with Netscape Com-
municator 4.05 and MS-Word 97, and it essentially works.
In my first test, the original line-breaks from the HTML page were kept
in the plain text file. In my second test, the original paragraph-ends were
kept as line-breaks in the plain text, and a particular tag (viz. DIV) was
not removed. Your mileage may vary, depending on the software versions
used.
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:50 EDT