Unicode Technical Committee Meeting #62 (Toronto, Canada -- Sept. 30, 1994) Discussion of Korean Hangul Proposal Composite of notes taken by: Joan Aliprand, Steve Greenfield, Tim Greenwood, John Jenkins Composite prepared by; Joan Aliprand =========================================================================== Attendees: Corporate Members: John McConnell, Apple jmcc@apple.com Tim Greenwood, Digital greenwood@r2me2.dec.com Mike Ksar, HP ksar@hpcea.ce.hp.com Don Carroll, HP don_carroll@hpboi1.desk.hp.com John Gioia, IBM gioia@vnet.ibm.com Fred Bealle, IBM fbealle@vnet.ibm.com Alexis Cheng, IBM alexis@vnet.ibm.com Hossein Kushki, IBM kushki@vnet.ibm.com Marty Marchyshyn, IBM martian@vnet.ibm.com Lisa Moore, IBM lisam@vnet.ibm.com Uma Umamaheswaran, IBM umavs@torolab6.vnet.ibm.com Ed Batutis, Lotus ebatutis@lotus.com Sungi Hong , Microsoft sghong@microsoft.com Young Lim, Microsoft youngl@microsoft.com Lloyd Honomichl, Novell lloyd_honomichl@novell.com Joan Aliprand, RLG br.jma@rlg.stanford.edu John Jenkins, Taligent john_jenkins@taligent.com Kelsey Bruso, Unisys bruso@unirsvl.rsvl.unisys.com Associate Members: John Bennett, Sybase jrb@sybase.com Individual Members: Tex Texin, Progress Software texin@bedford.progress.com Officers: Joe Becker, Xerox becker.osbu_north@xerox.com Liaisons: T.J. Kang, WG2-Korea Liaison Unicode Office Manager: Steve Greenfield unicode-inc@unicode.org Guest: Dirk Vermeulen, CASEC =========================================================================== Alternatives for Encoding Modern Hangul --------------------------------------- Jenkins listed the alternative solutions to the problem of encoding all modern hangul. This list was compiled at Taligent, by Jenkins, Mark Davis, and David Goldsmith. No additional alternatives were proposed at the UTC meeting. The alternatives are designated "a" through "f". Alternative "f" (current status) has two options (designated "1" and "2"). Although this list was presented in the middle of the UTC's discussion of the issue, it is put here at the beginning because it is a key element. a) add the additional 4,516 hangul to the BMP b) Move all 11,172 hangul to another plane c) Put the additional 4,516 hangul into another plane d) Copy all 11,172 hangul to another plane e) Permanently shrink the user zone in the Unicode standard and put the additional 4,516 hangul into the former user space. f) Do nothing (i.e., maintain current status) 1) Use conjoining jamos 2) Encode the additional 4,516 hangul in the user zone Presentation by T.J. Kang on Unicode in Korea --------------------------------------------- Kang outlined the events that have occured since June 1992 meeting of ISO/JTC1/SC2/WG2 in Seoul, Korea. Revision of KSC 5601 was completed after the 1992 WG2 meeting, and published as KSC 5601-1992. This adds the 11,732 "Johab" precomposed syllables. Scope of the principal standards that include modern hangul: KSC 5601-1987 has 2,350 precomposed modern hangul KSC 5657 (a standard that no one implemented) has 1,800 modern and 2,000+ ancient hangul KSC 5601-1992 introduced a new coding scheme for all 11,172 modern hangul KSC 5601 - 1992 11,172 ISO 10646 2,350 (from KSC 5601) 1,800+ (part of KSC 5657) 2,000+ old hangul of KSC 5657 were deleted and replaced by modern hangul The hangul from KSC 5657 were selected on the basis of frequency. The set of 2,000+ modern hangul which replaced the old hangul is in alphabetical order (not by frequency of occurrence). The remaining 4,516 hangul from KSC 5601 - 1992 take up three-quarters of the private use zone. Ministry of Commerce and Industry set up a committee to study the future character set needs of Korea. The Committee recommended (in its report published in November 1993) that Korean should go its own way, and not use the Unicode standard or ISO 10646. The modern part is finished; the Committee is now studying ancient hangul. Korea's WG2 participants were initially excluded from the Committee's deliberations, but were included after the Committee's report was published. The general view was that the Korean delegates at the WG2 meeting in Korea had let the nation down by not getting full hangul set into ISO 10646. There is enough flux within Korea that ISO 10646/Unicode can possibly be set aside and ignored. MS Windows is a major platform for most computer users in Korea. Currently, it supports only the earlier version of KSC 5601. Some Korean companies are doing their own modifications using the full 11,000 hangul (johab) of KSC 5601-1992. Angin (spelling?) has all johab and some ancient hangul; the company uses DOS and wrote its own routines for its Windows version. HanSoft is also doing the same thing, but is using a slightly different coding scheme. Standards people in Korea are concerned about the proliferation of coding schemes. |Windows is growing in popularity, and people are starting to splice johap |support onto it |Proliferation of de facto solutions #The dominant Korean word processing software (between 65-85% of #the population) does its own character handling on DOS and supports all 11,172 #characters. Now porting to Windows - not using Windows text API for text #display. Another company is writing a driver supporting all 11,732 characters #but with a slightly different extensions to KSC5601-1992 c (in regards to old #Hangul and Chinese character support). The Korean national character code committee is wondering about proposing the addition of a complete set of hangul to ISO again. The arguments for encoding hangul are: * economy of storage (one code, rather than a number of codes for conjoining jamos); and, * backwards compatibility (MS is interested for this reason). Koreans have not seen an implementation of Unicode/ISO 10646. MS is not sure whether it will have conjoining jamos in NT. The Korean national character code committee is responsible for reviewing national standards as well as WG2 participation. There is a difference of opinion in the Committee on the development of national Korean character sets versus adoption of an international standard (i.e., a national version of ISO 10646). #Still some disputes in the Korean #standards bodies about support or not for 10646. T.J. Kang requested support from the Consortium if Korea proposes addition of hangul characters to the BMP. |Korea's ISO group is hoping Unicode will go in with them on this issue #Korean standards body does not #want to propose this to WG2 with Unicode support. Becker: How stable is the 11,000 hangul set? A (Kang with Hong): It is not an open set. The hangul repertoire is also true for North Korea. Q. Is ancient hangul a separate issue? A. Character coding is a passionate issue in Korea. There have been features in the newspapers and on TV. Scholars want old hangul, as well as Chinese (i.e., hanja?). The repertoire of 11,000 hangul meets only modern needs. The need for old hangul is a minority opinion on the character set committee. Because of the number and variety of old hangul, conjoining jamos may be used to encode them. Greenwood: A Korean colleague said that johab were in the KSC 5601-1992 standard as an appendix, and were designated as for internal use only and not for interchange. A. This statement reflects a compromise. Government computers still use the earlier version of KSC 5601, and the government wanted to enforce its use for communication, Small scale LAN users, on the other hand, are using the new standard. Q. Use of private user space? A. An out for the Koreans. Later, companies and Korean delegates felt betrayed. Is the inclusion of the missing hangul in the BMP too good to be true? Ksar (in his ISO capacity as Convenor of WG2) pointed out that the Korean delegates to WG2 had not said anything about this in over three years, and have not asked for this to be put on the agenda of WG2. #Mike criticized Korea for being quiet about this for the past 2 years and #suddenly spring this on us. # Kang: Koreans feel that they have nothing to lose by asking the UTC for support. Presentation by S.G. Hong ------------------------- The proposal is from Microsoft Corporation as a member company. Unless we include the 5,600+ characters in precomposed form, MS will not be competitive in Korea. MS wants to comply with the Unicode standard, and also provide backwards compatibility. If a complete set of precomposed hangul are not included in the main code space, MS would have to support the conjoining jamos method. This would mean that a printer driver (for example) would have to convert from conjoining jamos to KSC 5601 -1992 encoding. Jenkins: Has to do a conversion anyhow if its coming from Unicode (i.e., if Unicode data is being directed to the printer). Hong: This may be a MS-specific problem. Honomichl: Are there things we are doing that we are not aware of and might cause us problems? If so, it would be good to know about them. Hong: MS needs to have 1:1 mappings in driver APIs for printers, screens, etc. Looking for a trivial 1:1 mapping, so can recompile to provide Unicode functionality. #Why is using conjoining jamos a problem? To use 5601 a printer driver has to #convert from Unicode to 5601 fonts. Conversion has to be made from either #precomposed Unicode or conjoining. Answer - all the Microsoft API's are based on #1-1 mapping from code page to Unicode. Jenkins: Means that this (1:1 mapping need) is true for all MS Unicode implementations. The problem just surfaced in Korea. |Microsoft's problem |Double API structure of Windows requires Unicode to be basically the same as |the "native" code page for any language |(That is, people would have to actually rewrite their printer drivers and |what not so that they no longer make casual assumptions about the structure |of the code set) Becker: Two different pleas: Why cant the current status get us somewhere? A. Does not satisfy the belief of Koreans that the whole set should be in there. But why isnt it (current status) ok for MS? #Joe - why not just put it in the private use area? Ans. - we want to comply #with the std and promote Unicode. # Hong: MS has been promoting the Unicode standard to ISVs. But the Product Development Group found problems: they say cannot ship product. There is debate within MS whether to use Unicode or to use DBCS for Korean. Jenkins: Because MS is using private user space for Shift- JIS, there is a conflict with Korean. There is insufficient room in the user zone for both Shift-JIS and the 4,516 hangul. (Option f2 conflicts with MS implementation for Shift-JIS.) Jenkins presented the alternatives (see above), and said that Taligent does not consider alternatives b, c or d to be viable options. Kang: Alternative d corresponds to a recommendation in the study issued by the Korean character set committee. Hong: MS would prefer Alternative e (because of conflict between Shift-JIS and hangul). This may be a viable option in MS view. Greenwood: Alternative e means a break with ISO 10646. This point was discussed. The private use zone is not part of the conformance clause of ISO 10646, so such a change would not cause the Unicode standard to be non-conformant (at least, legalistically). However, Alternative e would mean a non-compatible upgrade of the Unicode standard. Hong: MS is looking for consensus on how to use user zone for hangul. Cannot agree on Shift-JIS use. The UTC saw several problems with Alternative e: * The Unicode user zone is made permanently smaller; * Breaks with ISO 10646; * Breaks with existing implementations. Problem of gaiji characters in Japanese applications. Ksar: Where do UTF-16 and USC-4 fit in this? Has Korea considered a Korean national variant of ISO 10646? Hong asked for advice on strategies for the implementation of hangul. Initial opinion of the UTC was that the best long-term strategies are Alternative f1 (conjoining jamos) or Alternative b (move all 11,000 combined hangul to another plane), but Ksar pointed out that Alternative b would mean altering code point assignments of ISO 10646. When the eventual need to convert data from Alternative f2 (some hangul in the user zone) is taken into account, UTC opinion was that Alternatives c (4,516 combined hangul in another plane) or d (all 11,000 hangul in another plane) would be better. These alternatives parallel current combined single byte/double byte systems, with which Korea has considerable experience. The alternatives are also consistent with UTF-16, which has been approved by the UTC. Alternative c is preferable to Alternative d, as it avoids duplicate encoding of the same hangul. McConnell said he is against putting hangul in extra planes, because they are encoding variants, not presentation variants. Does not like the prospect of multiple encodings (i.e., conjoining jamos and precomposed hangul). Jenkins: Prefers putting them on another plane to putting them in the BMP. The problem with Alternative f2 (additional hangul in user zone) is that users are not *required* to use particular encoding values. Because this is user space, companies could declare their own arrangement for encoding these characters. Greenwood argued against changing the principles of the Unicode standard just because of simplistic implementation choices. Bennett asked Hong: What happens when you receive data encoded with conjoining jamos? Hong: Are member companies willing to pursue Alternative a (additional hangul in the BMP proper)? The consensus of the UTC was that MS needs to present a detailed cost/benefit analysis examination of all the alternatives, to explain why MS prefers Alternative a. Jenkins pointed out that, in the sort term, MS can do Alternative f2, and would be compliant with the Unicode standard. The UTC needs solid reasons explaining why Alternatives c, d and f are not going to happen in Korea. Jenkins said that supporting Alternative a is telling us you have to choose between Korea and Taiwan (which has proposed a collection of additional ideographs). Alternative a would also mean rejection of the German proposal for additional characters. Gioia: This brings in an additional factor: What is happening at the ISO level? The Koreans will have to compete with everyone else for the unassigned code points of the BMP. Ksar: Koreans have not even made a proposal to ISO for addition of the 4,000+ hangul to the BMP. Greenwood: Korean national standards body should propose the addition of the hangul to WG2. It would be improper for the Unicode Consortium to make such a proposal. [No UTC member disagreed with this statement.] #For WG2 this should be proposed by the Korean national body. Hong: Will bring alternatives to Product Group, ask them to evaluate each, and say how hard it would be to do each. Ksar: It is important for MS to continue to participate in the Unicode Technical Committee and WG2. Unicode, Inc. is a liaison to WG2, and so any resolutions of the UTC have to be presented to WG2 for a vote. There are real advantages for a company to be part of this process. Korea is not the only country that wants to add characters to the BMP. The issues are described in ISO document N884 (copies were distributed at the UTC meeting). MS needs to provide more information to convince the UTC that one alternative is better than the others, and that this alternative is better for other companies as well as for MS. But then you need to convince ISO. #UTC needs much more information from Microsoft on why c, d and e are bad. #To convince UTC to support some proposal we need solid figures why it is not #just good for Microsoft to add these characters, but it is good for all the #member companies. There is no question that the UTC is firmly committed to keeping ISO 10646 and the Unicode standard in synch. Those present at the meeting expressed vehement support for this. Action item (Greenwood, Aliprand, Greenfield, others?): Send copy of personal notes (draft Minutes in the case of Greenfield) to S.G. Hong and T.J. Kang. [DONE per this composite] Action item (S.G. Hong): Prepare a detailed cost/benefit analysis examination of all the alternatives. State which alternative is preferred by MS (and why). Give reasons why this alternative may be better for other companies too. This analysis to be submitted to the UTC. |The response |We need from Microsoft |More information to convince UTC that (a) is a better option than (d) or (f) |Arguments to take to the rest of the world to convince the rest of the world =END=