If you look on page 6-94 of Unicode 2.0, it says, "U+3000 ...
is provided for compatibility." It is also mentioned on page
6-130 in the description for the Halfwidth and Fullwidth Forms
block:
"Unifications. The fullwidth form of U+0020 SPACE is unified
with U+3000 IDEOGRAPHIC SPACE."
It's not at all clear to me what that is supposed to mean. I
thought characters that were unified in Unicode ended up with a
single encoding. It did suggest to me, however, a relationship
between U+3000 and U+0020 comparable to that of U+FF01 and
U+0021 - though perhaps I was reading too much into it. In
terms of pure, plain-text semantics, it seems that U+3000
relates to U+0020 in precisely the same way that U+FF01 relates
to U+0021 .
Between those two references, I concluded that U+3000 should be
treated the same way as characters in the range U+FF00-FF5E.
The following also from page 6-130 is also relevant:
<quote>
The characters in this block consist of fullwidth forms of the
ASCII block (except SPACE)... As with other compatibility
characters, the preferred Unicode encoding is to use the
nominal counterparts of these characters and use rich text font
or style bindings to select the appropriate glyph size and
width.
</quote>
This seems to require option 2 (and to denegrate option 1), but
surely option 4 would also be considered acceptable - i.e. I
think the main point is to avoid encoding text using
compatibility characters if possible. Whether or not U+3000 is
to be treated like characters in the compatibility area is
still not clear to me.
Peter
From: <Marco.Cimarosti@icl.com> AT Internet on 12/09/99 12:47
PM
Received on: 12/09/99
To: Peter Constable/IntlAdmin/WCT, <unicode@unicode.org> AT
Internet@Ccmail
cc:
Subject: Re: EA width, Latin punctuation and fonts
Peter Constable says:
>1. Include both wide and narrow glyphs in a single font and
>encode text using U+3000 etc. (i.e. encode using compatibility
>characters).
Why do you say that U+3000 is a compatibility character? My
understanding of a "compatibility character" in Unicode is a
character that either:
1) has the word "COMPATIBILITY" in its name (from
ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt)
2) is in a block that has the word "Compatibility" in its name
(from
ftp.unicode.org/Public/UNIDATA/Blocks.txt)
U+3000 seems to fall in neither of these cases: it just has a
compatibility mapping ("<wide> 0020"), as many other characters
do.
Marco
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT