From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu Nov 06 2003 - 08:38:40 EST
On Wed, 5 Nov 2003 12:24:00 +0100, "Philippe Verdy" wrote:
>
> The obliterated character needed for paleolitic studies, or to encode any
> texts in which the character is not recognizable already exists: isn't it
> the REPLACEMENT CHARACTER?
>
The problem of how to represent missing/obliterated characters in Unicode when
transcribing manuscript/printed texts and inscriptions, etc. has always
perplexed me.
U+FFFD [Replacement Character] is "used to replace an incoming character whose
value is unknown or unrepresentable in Unicode", and is definitely not the
correct character to use to represent a missing or obliterated character in a
non-electronic source text.
For Chinese the standard glyph for a missing/obliterated/unclear ideograph is a
full-width hollow square (i.e. the same size as a CJK ideograph). This glyph is
very common in modern printed Chinese texts, from scholarly editions of ancient
texts unearthed from 2,000 year old tombs to popular typeset reprints of 19th
century novels. Several examples of the usage of this glyph in modern printed
texts from the PRC can be found at
http://uk.geocities.com/babelstone1357/CJK/missing.html
The problem is how to represent this glyph in electronic texts. Browsing the
internet there seem to be two, both unsatisfactory, ways of representing this
"missing ideograph" glyph :
1. Using U+25A1 [WHITE SQUARE] (although any of the other white square
graphic symbols encoded in Unicode, such as U+25A2, U+25FB or U+25FD, could also
be used I suppose). The problems with this character are :
a) it has the wrong character properties for use within running CJK text.
b) with CJK fonts such as SimSun U+25A1 is rendered the same height and width as
a CJK ideograph, but with non-Chinese fonts such as Arial Unicode MS U+25A1 may
be rendered much smaller than a CJK ideograph, which looks totally wrong.
2. Using U+56D7 [a CJK ideograph, rarely used other than as a radical =
U+2F1E], which has the right character properties, and renders at the correct
size; but the glyph shape may not be completely square depending upon the font
style, and basically it is just the wrong character for the job.
It would be extremely useful to have a dedicated Unicode character for "missing
CJK ideograph" with the right character properties, and I have considered making
a proposal for such a character, but have hesitated as if there really is such a
great need for it (and I personally have web pages which transcribe texts with
missing/obliterated ideographs where such a character is desperately needed)
then why does it not already exist in Unicode or pre-existing Chinese encoding
standards ?
Andrew
This archive was generated by hypermail 2.1.5 : Thu Nov 06 2003 - 09:22:30 EST