L2/01-477 From: John H. Jenkins [jenkins@apple.com] Sent: Monday, December 10, 2001 11:58 AM Subject: Report on IRG #18 (Tokyo, Japan, 3-6 December 2001) The eighteenth meeting of the IRG was held in Tokyo, Japan, from 3 to 6 December 2001. I attended as Unicode's liaison and Hideki Hiura from Sun attended as L2's representative. All IRG members were in attendance except Singapore: mainland China, Taiwan, the Hong Kong SAR and Macao SAR, Japan, North Korea, South Korea, Vietnam, and the US. The main foci of the meeting were (a) the final revision to mapping data from the DPRK (North Korea) to be included in ISO/ IEC 10646-1 and -2, and (b) submissions for Vertical Extension C. There were some other matters of interest to Unicode that came up. The other matters first. Hong Kong had a list of six characters that are in HK SCS that they wanted to have added to Unicode/10646. Four of them are precomposed Latin letters, ê and Ê with macrons and carons; the other two are engineering symbols. I told them that precomposed Latin letters can no longer be added to the standard. Their response was that this wasn' t so much an issue as whether or not glyphs for these characters would be available in existing applications and with existing operating systems. The upshot is that I'm looking into drafting a TR which lists precomposed Latin glyphs which have been rejected for addition to Unicode but which should be available for general use. The TR would provide guidelines for font designers and companies producing operating systems. Right now, the tendency is to look through existing Unicode blocks when deciding what precomposed glyphs should be added to fonts; this would allow them to go beyond that. I reported on actions taken at the last UTC with regards to Han variants. Variants in general were at the heart of the problems with Vertical Extension C. South Korea, for example, brought to the table a proposal for Extension C which includes a total of some 26,000 characters. These are derived from an existing system representing the Korean version of the Tripitaka (Buddhist scriptures). It's clear that the Koreans simply took over the repertoire without really considering whether or not it contained unifiable variants. For example, there are 41 ideographs contained therein which are simply variants of U+9F9C (turtle). Basically, as I describe below, the IRG is going to move forward as it has in the past and not accept proposals which contain a large number of unifiable variants. If and when WG2 instructs it otherwise, this will continue to be its policy. It does anticipate, however, instruction from WG2 on the subject. I also forwarded to Taiwan the email received on unicode@unicode.org regarding holes in the CNS 11643-1992 mappings used by the IRG. They will investigate the matter. The IRG's editor also found some errors in the Hanyu Da Zidian mapping data the IRG uses. She'll work with Richard Cook to resolve them. There are additionally some characters from the KangXi and Hanyu Da Zidian still missing. China will double-check and add them to their proposal if needed. So far as the DPRK mappings were concerned, the North Koreans proposed five revisions to the mappings they've submitted before. I'd already checked the earlier mappings for consistency and completeness, and these revisions were consistent with the earlier mappings. They were accepted and will be forwarded to WG2 by the end of this week. So far as Extension C is concerned, pretty much everybody had proposals. They divided neatly into two groups: Small proposals: Macao ~22 Singapore ~25 Hong Kong ~35 Unicode ~100 Large proposals: China ~6500 Japan ~1000 Taiwan ~14000 (largely personal and place names) South Korea ~26000 Vietnam ~2200 Hong Kong and Unicode argued that it would be best for the IRG to take these in two lots: the small proposals, then the large ones. The small ones could be reviewed quickly and a complete Extension C finished by the end of 2002. The large ones could then proceed at the more leisurely pace they would require. Both Taiwan and South Korea objected to this. In the end, it was agreed that all the proposals should be submitted by April 2002. Proposals will be divided into two categories: Extension C1 and Extension C2. (Division into Extensions C and D was objected to, but I think that's how it will end up.) Extension C1 submissions will basically consist of the clearly distinct, non-unifiable variants characters each member wants. Extension C2 will consist of those characters where unification may be possible and where more time will be needed for analysis. The goal here is to process Extension C1 fairly quickly. This is important to Hong Kong; they're trying to convince the SAR government to standardize on Unicode, but it won't until every character it needs is in the standard. Rather rigid criteria for the proposals were agreed on. If a C1 proposal contains more than 5% unifiable variants, as determined by the IRG's editor, it will automatically be moved to C2. One further note on Extension C. The IRG has basically abandoned the four-dictionary sorting algorithm described in TUS 3.0. In its place, Hong Kong suggested we use the standard five-stroke sorting algorithm which is used in some dictionaries. This was adopted. Other stuff: Future IRG meetings will be held in Macao (6-10 May 2002, which is actually *before* the next WG meeting), Hanoi (18-22 November 2002), and Zhuhai (May 2003). The interesting side effect of this is that there will then be three spring meeting in a row in Guangdong province. The IRG is moving to all electronic communications, and Mr. Zhang is trying hard to crack down on people not getting stuff in *before* the meetings. The IRG does *not* like WG2's idea of going to a single column format for Han in 10646. I offered that Unicode could provide a TR with the multi-column format if they wanted to go that route. They also didn't like WG2's resolution on fonts. They were concerned that it would give the 10646 editor too much power to change glyphs without proper review, among other things. ========== John H. Jenkins jenkins@apple.com jenkins@mac.com http://homepage.mac.com/jenkins/ 3