Re: Translated IUC10 Web pages: Experimental Results

From: becker.osbu_north@xerox.com
Date: Tue Feb 04 1997 - 14:53:31 EST


Thank you all, we're clearly well on the road though not yet arrived. Here are
a few observations with NT 4.0 and Office 97, using the Bitstream Cyberbit font
handed out at IUC9:

Charles> I have added ...
Charles> http://194.75.134.50/unicode/iuc10/x-ucs2l.html
Charles> (UCS-2, least significant byte first, MicrosoFFFE)

Thank you for going to this trouble, my first experiences with this are:

    o Netscape 3.0 loads the page, shows the first couple dozen characters (as
ASCII/garbage); attempting to download it, Netscape similarly truncates the
file very early

    o MS IE 3.0 cannot open the page

    o Word 97 opens it (via the procedure below) as correct Unicode plaintext
HTML source

        o Word 97 Save As ... Unicode Text correctly writes this as a
MicrosoFFFE file that can e.g. be read by NT Notepad

        o Clipboard copy/paste to NT Notepad also works

        o Clipboard paste to PowerPoint 97 is rejected ("error")

Charles> http://194.75.134.50/unicode/iuc10/x-ucs2.html
Charles> (UCS-2, most significant byte first)

    o Word 97 opens the first several lines as correct plaintext HTML source,
then starts a huge stream of random bytes right in the middle of the first
<img> tag, namely after "... <img a" (i.e. it goes bonkers after the "a" in
"alt")

Chris> Select this URL below
Chris> http://www.cm.spyglass.com/unicode/iuc10/x-utf8.html
Chris> Edit/Copy
Chris> File/Open (in Word97)
Chris> Paste into the filename box
Chris> OK

This works beautifully, thank you! Word 97 Save As ... Unicode Text also
correctly writes this as a MicrosoFFFE text file, thus providing perhaps the
simplest path to extract all the text back out of this page.

I also tried these Unicode multilingual sample pages:

http://www.lang.duke.edu/unichtm/unilang8.htm -- presence/absence of BOM
unknown

    o Netscape 3.0 (with Registry hack) loads the page fine

        o Clipboard copy/paste to NT Notepad treats text as ASCII, i.e.
high-order characters garbaged

    o Word 97 opens the page as ASCII, high-order characters garbaged

http://www.lang.duke.edu/unichtm/unilang.htm -- little-endian UCS-2,
presence/absence of BOM unknown

    o Word 97 opens the page correctly

Joe



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT