L2/99-389
From: John Jenkins [jenkins@apple.com]
Sent: Wednesday, December 15, 1999 2:42 PM
Subject: Report from Singapore
Sorry about the delay in getting this out, but charts.unicode.org and I have
been having a long tête-à-tête which has consumed much of my time.
The fourteenth meeting of the IRG was held December 6-10, 1999 at the
offices of Sybase in Singapore. Representatives were there from mainland
China, Taiwan, the HKSAR, Japan, South Korea, Vietnam, Singapore, the US,
and Unicode. (Yes, this means that the IRG actually has *three* Chinas.)
Zhang Zhoucai (China) continued acting ably as rapporteur.
L2 was represented by Michel Suignard (Microsoft, the head of delegation),
Hideki Hiura (Sun), and Michael Yau (Oracle). Michael Kung (also Microsoft)
joined the L2 delegation the last two days of the meeting. I represented
Unicode.
The UTC had two items it wanted the IRG to deal with at the meeting. One
was the suggested use of 56 code points in the current CJK Compatibility
Ideographs Block for new compatibility ideographs for JIS X 0213; the other
was a document reviewed by the UTC recommending 313 ideographs be added to
the BMP for the sake of JIS X 0213.
##### Compatibility ideographs
Compatibility ideographs were a major item for discussion, as it happened.
Sato Takayuki was part of the Japanese delegation and was also anxious to
have the issue discussed. Moreover, the Taiwanese delegation had two
proposals dealing with compatibility ideographs. One was the allocation of
a 2048 code point block in Plane 2 for compatibility ideographs, and the
other was the addition of 464 compatibility ideographs for CNS 11643-1992 in
that block.
Basically, compatibility ideographs are currently a WG2 matter, not an IRG
one. The current compatibility ideographs are in the standard because the
UTC required them, not because the IRG did. Still, there was general
consensus on a number of points:
1) The desire to have compatibility ideographs added to the standard for
the sake of full-round trip compatibility with major standards is
reasonable, so long as the numbers stay small (ca. 60 for JIS X 0213, ca.
500 for CNS 11643-1992, and ca. 100 for the Hong Kong SAR).
2) Unification *between* compatibility ideograph sets is reasonable, and
the IRG is willing to take on the responsibility for doing that unification
work, if WG2 asks it.
3) There is a potential problem if people start coming up with large
character sets rife with compatibility ideographs, e.g., EACC. Some way of
avoiding that will be necessary; we don't want to turn Plane 2 into a
full-fledged glyph registry for ideographs. My own recommendation was that
we limit the sources for compatibility ideographs to the IRG's formal
sources, but there was no resolution of the issue.
4) In addition to the identified sources of compatibility ideographs, Korea
and mainland China may also request some. People haven't been thinking in
terms of compatibility ideographs hitherto, and so nobody's really given
much thought to the issue of what they might actually need, other than that
it will basically be a corner-filling-in process.
5) Among the data desired for compatibility ideographs is the code point of
the character of which it is a variant -- e.g., our compatibility mappings
in the Unicode database. The issue of whether this should include Radicals
was unresolved. That is, if a candidate for encoding is a variant of a
character in one of the two radicals blocks in the standard, should it be
encoded as a unique ideograph or as a compatibility ideograph? I punted on
this one and abstained, as I felt this is an issue which the UTC would want
to discuss before taking a stand.
I did, BTW, express support for Taiwan's proposal. Inasmuch as the UTC has
started to accept new compatibility ideographs for major standards, and
inasmuch as Taiwan's set was relatively small (for CNS) and was going to be
stuck off on Plane 2, I felt it was a reasonable thing to do.
##### Vertical Extension B
The other major item of business was Vertical Extension B. As expected, the
current version does *not* cover all of JIS X 0213, but (also as expected),
the actual holes aren't the ones Cora found. Still, the Japanese delegation
had a list of everything needed to cover JIS X 0213 to their satisfaction
and there was general consensus to approve that. China and Taiwan also had
longish lists of characters they needed to add, which were generally
accepted.
As a business item for L2, it will be necessary in our vote on the CD for
10646-2 to request that Vertical Extension B be updated per the Editor's
report from the IRG meeting. Whether we want to make this a "yes-comment"
or a "no-comment" depends on how we feel we can best make sure the change
can get made without endangering the schedule for 10646-2. Apple would
still be of the opinion that 10646-2 *without* the revision of Vertical
Extension B would be unacceptable.
One issue which came up, however, during the editorial meeting was the
status of the non-cognate rule. This rule, found in Annex S of 10646 and on
page 6-109 of Unicode 2.0 states that two similar characters which are
"unrelated in historical derivation" are not unified. (It is also implied
on page 6-108 where we state that to be candidates for unification, two
ideographs must "occupy a single point on the X [semantic] and Y [abstract
shape] axes" of the three-axis model.)
The problem is that there has been no formal specification for when to
invoke the non-cognate rule, and the IRG has not been paying much attention
to it in the work on Vertical Extension B. Japan, however, invoked it to
deunify one of the characters from JIS X 0213, and so the issue becomes
important because the character in question will need to be a compatibility
ideograph if it cannot be added in its own right via the non-cognate rule.
The IRG adopted a number of proposals to deal with the issue. One is that
the two characters are to be considered non-cognates if both occur in the
same one of the IRG's standard dictionaries and have different meanings.
(What precisely "different meanings" means was left vague.) Other evidence
may be considered if needed and approved by the IRG.
There was again a danger that China in particular might decide to start
invoking the non-cognate rule in cases where it's been overlooked hitherto
during the development of Vertical Extension B. This could add a lot of new
ideographs, destabilize Vertical Extension B, and slow down 10646-2. As a
result, the IRG resolved *not* to do any (more) changes to Vertical
Extension B based on the non-cognate rule. From this point on, if somebody
wants to invoke it, they have to wait until Vertical Extension C to do so.
##### How I violated my instructions
I didn't pursue the issue of the 313 JIS X 0213 characters, as I felt the
issue was moot. I discussed the issue of JIS X 0213 coverage with the
Japanese delegation, and Kobayashi Tatsuo in particular, and Kobayashi-san
assured me that only three ideographs in the Japanese proposal needed to be
added to Vertical Extension B for full coverage (together with Japan's list
of compatibility ideographs).
The matter, however, is still pending for two of the three ideographs Japan
needs in Vertical Extension B. Japan needs to provide evidence for its
invocation of the non-cognate rule to the editorial committee by the end of
the month. If it does so, they'll be added to the editor's report. We'll
need to keep an eye open to make sure this happens.
I also proposed something to help smooth things out in the future: A
U-source for the IRG. The U-source currently consists of all the characters
in the CJK Compatibility Ideographs block. Mappings to existing ideographs
were provided. The idea here was to make sure that the IRG *tracks* these
ideographs when it does its work. In particular, there are ten of them
which are *not* compatibility variants of ideographs but have been added to
the compatibility block because they come from industrial, not national,
standards. These ten are singled out in the Unicode 3.0 book as part of
Unicode's ideograph set.
The IRG is already tracking this set of characters in its SuperCJK database,
as it's been stung twice by characters found therein getting added as if
they weren't already in the standard. Having a U-source number for them,
however, makes sure that their origins are marked and helps guarantee that
we'll continue to keep track of them.
##### Action items for L2
We need to make adoption of the changes to Vertical Extension B proposed by
the IRG part of our vote on 10646-2.
We need to pin down our own position on compatibility ideographs, viz., are
we willing to add more beyond the JIS X 0213 set? Do we support having a
formal block for adding them? Are we willing to help sponsor Taiwan's
proposal? How do we think the candidates for compatibility ideographs
should best be limited? Should radicals be included among the
"compatibility variants" for compatibility ideographs? What data -- beyond
compatibility variant, source, glyph, and radical/stroke information -- do
we need?
Apple would also encourage us to help forward *Japan's* request for new
compatibility ideographs and should have a strategy in mind to do this. JIS
X 0213 is all-but final now. Once we know exactly what characters are
needed, we should try to get them added in an expeditious fashion,
presumably as an amendment to part 1 of 10646.
I'm sure there's more that will occur to me as soon as I send this (ain't
that always the case?), but this is getting long enough as it is, so I'm
going to cut off here.
=====
John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng