L2/99-389

From: John Jenkins [[email protected]]

Sent: Wednesday, December 15, 1999 2:42 PM

Subject: Report from Singapore

Sorry about the delay in getting this out, but charts.unicode.org and I have

been having a long t�te-�-t�te which has consumed much of my time.

The fourteenth meeting of the IRG was held December 6-10, 1999 at the

offices of Sybase in Singapore. Representatives were there from mainland

China, Taiwan, the HKSAR, Japan, South Korea, Vietnam, Singapore, the US,

and Unicode. (Yes, this means that the IRG actually has *three* Chinas.)

Zhang Zhoucai (China) continued acting ably as rapporteur.

L2 was represented by Michel Suignard (Microsoft, the head of delegation),

Hideki Hiura (Sun), and Michael Yau (Oracle). Michael Kung (also Microsoft)

joined the L2 delegation the last two days of the meeting. I represented

Unicode.

The UTC had two items it wanted the IRG to deal with at the meeting. One

was the suggested use of 56 code points in the current CJK Compatibility

Ideographs Block for new compatibility ideographs for JIS X 0213; the other

was a document reviewed by the UTC recommending 313 ideographs be added to

the BMP for the sake of JIS X 0213.

##### Compatibility ideographs

Compatibility ideographs were a major item for discussion, as it happened.

Sato Takayuki was part of the Japanese delegation and was also anxious to

have the issue discussed. Moreover, the Taiwanese delegation had two

proposals dealing with compatibility ideographs. One was the allocation of

a 2048 code point block in Plane 2 for compatibility ideographs, and the

other was the addition of 464 compatibility ideographs for CNS 11643-1992 in

that block.

Basically, compatibility ideographs are currently a WG2 matter, not an IRG

one. The current compatibility ideographs are in the standard because the

UTC required them, not because the IRG did. Still, there was general

consensus on a number of points:

1) The desire to have compatibility ideographs added to the standard for

the sake of full-round trip compatibility with major standards is

reasonable, so long as the numbers stay small (ca. 60 for JIS X 0213, ca.

500 for CNS 11643-1992, and ca. 100 for the Hong Kong SAR).

2) Unification *between* compatibility ideograph sets is reasonable, and

the IRG is willing to take on the responsibility for doing that unification

work, if WG2 asks it.

3) There is a potential problem if people start coming up with large

character sets rife with compatibility ideographs, e.g., EACC. Some way of

avoiding that will be necessary; we don't want to turn Plane 2 into a

full-fledged glyph registry for ideographs. My own recommendation was that

we limit the sources for compatibility ideographs to the IRG's formal

sources, but there was no resolution of the issue.

4) In addition to the identified sources of compatibility ideographs, Korea

and mainland China may also request some. People haven't been thinking in

terms of compatibility ideographs hitherto, and so nobody's really given

much thought to the issue of what they might actually need, other than that

it will basically be a corner-filling-in process.

5) Among the data desired for compatibility ideographs is the code point of

the character of which it is a variant -- e.g., our compatibility mappings

in the Unicode database. The issue of whether this should include Radicals

was unresolved. That is, if a candidate for encoding is a variant of a

character in one of the two radicals blocks in the standard, should it be

encoded as a unique ideograph or as a compatibility ideograph? I punted on

this one and abstained, as I felt this is an issue which the UTC would want

to discuss before taking a stand.

I did, BTW, express support for Taiwan's proposal. Inasmuch as the UTC has

started to accept new compatibility ideographs for major standards, and

inasmuch as Taiwan's set was relatively small (for CNS) and was going to be

stuck off on Plane 2, I felt it was a reasonable thing to do.

##### Vertical Extension B

The other major item of business was Vertical Extension B. As expected, the

current version does *not* cover all of JIS X 0213, but (also as expected),

the actual holes aren't the ones Cora found. Still, the Japanese delegation

had a list of everything needed to cover JIS X 0213 to their satisfaction

and there was general consensus to approve that. China and Taiwan also had

longish lists of characters they needed to add, which were generally

accepted.

As a business item for L2, it will be necessary in our vote on the CD for

10646-2 to request that Vertical Extension B be updated per the Editor's

report from the IRG meeting. Whether we want to make this a "yes-comment"

or a "no-comment" depends on how we feel we can best make sure the change

can get made without endangering the schedule for 10646-2. Apple would

still be of the opinion that 10646-2 *without* the revision of Vertical

Extension B would be unacceptable.

One issue which came up, however, during the editorial meeting was the

status of the non-cognate rule. This rule, found in Annex S of 10646 and on

page 6-109 of Unicode 2.0 states that two similar characters which are

"unrelated in historical derivation" are not unified. (It is also implied

on page 6-108 where we state that to be candidates for unification, two

ideographs must "occupy a single point on the X [semantic] and Y [abstract

shape] axes" of the three-axis model.)

The problem is that there has been no formal specification for when to

invoke the non-cognate rule, and the IRG has not been paying much attention

to it in the work on Vertical Extension B. Japan, however, invoked it to

deunify one of the characters from JIS X 0213, and so the issue becomes

important because the character in question will need to be a compatibility

ideograph if it cannot be added in its own right via the non-cognate rule.

The IRG adopted a number of proposals to deal with the issue. One is that

the two characters are to be considered non-cognates if both occur in the

same one of the IRG's standard dictionaries and have different meanings.

(What precisely "different meanings" means was left vague.) Other evidence

may be considered if needed and approved by the IRG.

There was again a danger that China in particular might decide to start

invoking the non-cognate rule in cases where it's been overlooked hitherto

during the development of Vertical Extension B. This could add a lot of new

ideographs, destabilize Vertical Extension B, and slow down 10646-2. As a

result, the IRG resolved *not* to do any (more) changes to Vertical

Extension B based on the non-cognate rule. From this point on, if somebody

wants to invoke it, they have to wait until Vertical Extension C to do so.

##### How I violated my instructions

I didn't pursue the issue of the 313 JIS X 0213 characters, as I felt the

issue was moot. I discussed the issue of JIS X 0213 coverage with the

Japanese delegation, and Kobayashi Tatsuo in particular, and Kobayashi-san

assured me that only three ideographs in the Japanese proposal needed to be

added to Vertical Extension B for full coverage (together with Japan's list

of compatibility ideographs).

The matter, however, is still pending for two of the three ideographs Japan

needs in Vertical Extension B. Japan needs to provide evidence for its

invocation of the non-cognate rule to the editorial committee by the end of

the month. If it does so, they'll be added to the editor's report. We'll

need to keep an eye open to make sure this happens.

I also proposed something to help smooth things out in the future: A

U-source for the IRG. The U-source currently consists of all the characters

in the CJK Compatibility Ideographs block. Mappings to existing ideographs

were provided. The idea here was to make sure that the IRG *tracks* these

ideographs when it does its work. In particular, there are ten of them

which are *not* compatibility variants of ideographs but have been added to

the compatibility block because they come from industrial, not national,

standards. These ten are singled out in the Unicode 3.0 book as part of

Unicode's ideograph set.

The IRG is already tracking this set of characters in its SuperCJK database,

as it's been stung twice by characters found therein getting added as if

they weren't already in the standard. Having a U-source number for them,

however, makes sure that their origins are marked and helps guarantee that

we'll continue to keep track of them.

##### Action items for L2

We need to make adoption of the changes to Vertical Extension B proposed by

the IRG part of our vote on 10646-2.

We need to pin down our own position on compatibility ideographs, viz., are

we willing to add more beyond the JIS X 0213 set? Do we support having a

formal block for adding them? Are we willing to help sponsor Taiwan's

proposal? How do we think the candidates for compatibility ideographs

should best be limited? Should radicals be included among the

"compatibility variants" for compatibility ideographs? What data -- beyond

compatibility variant, source, glyph, and radical/stroke information -- do

we need?

Apple would also encourage us to help forward *Japan's* request for new

compatibility ideographs and should have a strategy in mind to do this. JIS

X 0213 is all-but final now. Once we know exactly what characters are

needed, we should try to get them added in an expeditious fashion,

presumably as an amendment to part 1 of 10646.

I'm sure there's more that will occur to me as soon as I send this (ain't

that always the case?), but this is getting long enough as it is, so I'm

going to cut off here.

=====

John H. Jenkins

[email protected]

http://www.blueneptune.com/~tseng