From: Allen Haaheim (haaheima@interchange.ubc.ca)
Date: Sun Mar 23 2003 - 17:20:24 EST
> I tried what you suggested with unipad, but for some reason it went to a
> location on a PUA character map, rather than CJK Unified Ideographs
> Extension B, where they are in fact located. I guess it is because Unipad
> doesn't support Extension B yet, or else I am doing something wrong. But
> thanks for directing me to the Unipad website, I'm sure it will be useful.
Code points above FFFF are represented by pairs of code values in the
Surrogates Area, not in the Private Use Area.
Unipad should be able to show those Surrogate Area values.
The code points of the blanks on the website are really in the PUA.
I don't completely follow you here, but I can see the code points are as you
say.
However, I have (correct) hard copies of the text, and there is no doubt
that the chars that should be there are U+2835C and U+283B9, in the Ext. B
chart. The website is unlikely to have the wrong chars. But
there's definitely something wrong somewhere--maybe it is their
fonts. As with you, only two of their fonts seem to work for me.
> Here is a sample line of text with the two graphs as blanks (on my
> machine),
> second and third from the left. They are No. 2835C and 283B9 respectively,
> on p.152 of the Extension B pdf:
>
> 心而鮮歡。望天涯而佇念,擢雄劍而長歎。
The second and third char are U+E596 and U+E58E
> The page this text is from is
> http://www.chant.org/scripts/frame.asp?t=b&id=000675 I don't think
> you'll get into the site unless you or your university is a member.
I can access the fonts at:
http://www.chant.org/info/download_font.asp
Only two of the fonts work on my Windows 98 SE
ICS3 and ICS4
If I copy the chinese text to wordpad and change the font the second and the
third char become chinese chars.
But in ICS3 they look very different from ICS4
Neither are right. ICS3 and ICS4 are both for the Oracle Bone Script
database. With our sample text, ICS3 displays OBS graphs (i.e. not standard
Chinese), and ICS4 displays Chinese gibberish. Our text is from the Pre-Han
& Han and Six Dynasties databases, which use their ICS1, ICS2, and ICS6
fonts. As you say, it seems these fonts don't work. If I run a windows
search, it shows in the search results that they are all in the
Windows/fonts folder. But upon looking in the folder itself, or in the font
box in Word2000, they aren't there. I guess this has something to do with
them not working.
The following case might confirm it's a problem with the website's fonts:
http://www.chant.org/scripts/zj/scripts/frame.asp?t=b&id=000869
text (Shijing #57, last line):
庶姜,庶士有朅!
Unipad shows the code points of the third and fourth char from the left (the
same character) to be U+E053. But the character that belongs there is
U+5B7D, as another website http://210.69.170.100/s25/index.htm (Han Quan),
shows in the same line of text: 庶姜孽孽,庶士有朅。And this character is not even in
Ext A or B, but the regular Unicode CJK U I charset.
There are other cases where both these sites do not display the character
(that is, if the problem is not at my end) (Shijing #40, line 11):
1) 室人交我。
http://www.chant.org/scripts/zj/scripts/frame.asp?t=b&id=000869
The fourth and fifth characters should be 徧 U+5FA7 and 讁 U+8B81, but Unipad
shows they are U+E052 and U+E536.
2) 室人交遍謫我。
http://210.69.170.100/s25/index.htm (Han Quan)
One would also expect Han Quan, like CHANT, to be rigorous and precise. Here
there are substitutions for the two characters in question, followed by a
blank that Unipad indicates is U+F6B1. Such substitutions should only be
necessary when the actual characters are unavailable. What is behind the
blank I'm not sure, but it may be a note explaining the substitutions. But
again, all four of the characters can be found in the basic CJK charset, not
even Ext A or B. I suppose the websites are not using Unicode charsets?
Thanks again for your remarks and suggestions.--Allen
This archive was generated by hypermail 2.1.5 : Sun Mar 23 2003 - 17:56:42 EST