Thank you all, we're clearly well on the road though not yet arrived. Here are
a few observations with NT 4.0 and Office 97, using the Bitstream Cyberbit font
handed out at IUC9:
Charles> I have added ...
Charles> http://194.75.134.50/unicode/iuc10/x-ucs2l.html
Charles> (UCS-2, least significant byte first, MicrosoFFFE)
Thank you for going to this trouble, my first experiences with this are:
o Netscape 3.0 loads the page, shows the first couple dozen characters (as
ASCII/garbage); attempting to download it, Netscape similarly truncates the
file very early
o MS IE 3.0 cannot open the page
o Word 97 opens it (via the procedure below) as correct Unicode plaintext
HTML source
o Word 97 Save As ... Unicode Text correctly writes this as a
MicrosoFFFE file that can e.g. be read by NT Notepad
o Clipboard copy/paste to NT Notepad also works
o Clipboard paste to PowerPoint 97 is rejected ("error")
Charles> http://194.75.134.50/unicode/iuc10/x-ucs2.html
Charles> (UCS-2, most significant byte first)
o Word 97 opens the first several lines as correct plaintext HTML source,
then starts a huge stream of random bytes right in the middle of the first
<img> tag, namely after "... <img a" (i.e. it goes bonkers after the "a" in
"alt")
Chris> Select this URL below
Chris> http://www.cm.spyglass.com/unicode/iuc10/x-utf8.html
Chris> Edit/Copy
Chris> File/Open (in Word97)
Chris> Paste into the filename box
Chris> OK
This works beautifully, thank you! Word 97 Save As ... Unicode Text also
correctly writes this as a MicrosoFFFE text file, thus providing perhaps the
simplest path to extract all the text back out of this page.
I also tried these Unicode multilingual sample pages:
http://www.lang.duke.edu/unichtm/unilang8.htm -- presence/absence of BOM
unknown
o Netscape 3.0 (with Registry hack) loads the page fine
o Clipboard copy/paste to NT Notepad treats text as ASCII, i.e.
high-order characters garbaged
o Word 97 opens the page as ASCII, high-order characters garbaged
http://www.lang.duke.edu/unichtm/unilang.htm -- little-endian UCS-2,
presence/absence of BOM unknown
o Word 97 opens the page correctly
Joe
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT