======================================================================== Date: Wed, 22 May 91 10:44:38 EDT From: Edwin Hart Subject: New LISTSERV??? To: Jim Jones Dear Jim, Can you create yet another LISTSERVe for me, please? I expect this one to have a fairly short lifetime and hope that we will not need it after the end of 1991. I need it for some ad hoc committee work. It does not need security but I believe that WE SHOULD AVOID ADVERTISING THAT IT EXISTS at least for now. name: 10646M purpose: For discussion of merging ISO DIS 10646 and Unicode into one global multibyte code. If questions, please let me know. I set the MAIL option back on for Ksar on the ANSIX3L2 list. If we have problems again, we can turn it off. I think he gets too much mail queued on his system and then we get the resulting problems reflected back to us. Thanks for all of your support, Ed Ú Edwin Hart Jim Jones 05/22/91 New LISTSERV??? ======================================================================== Date: Wed, 22 May 91 11:01:02 EDT From: Edwin Hart Subject: Other compaction methods To: Olle Jarnefors Dear Mr. Ja"rnefors, It was a pleasure meeting you at the WG2 meeting. I have mailed the two documents you requested: SHARE Europe, National Language Architecture white paper SHARE, ASCII and EBCDIC Character Set and Code Issues in SAA white paper I hope your stay was pleasant and that you had a safe trip back. I took the liberty of placing your name on the task list for the task of determining the other compaction methods to be specified in 10646M beyond a 2-octet form for the base multilingual plane and the 4-octet canonical form. I thought we needed some balance by having coordinators outside of North America, especially since the people from Unicode only want a 2-byte and 4-byte compaction method, and several of the European standards bodies have previously stated that they need a 1-byte compaction method. Best regards, Ed Hart Ú Edwin Hart Olle Jarnefors 05/22/91 Other compaction methods ======================================================================== Date: Wed, 22 May 91 11:18:18 EDT From: Edwin Hart Subject: 10646M action item for you? To: Karen Smith-Yoshimura , Wayne Davison Dear Karen and Wayne, Between the two of you, can you coordinate the review of how floating accents are to be handled in 10646M with TC46 and CCITT, and others? I will take the liberty of placing your name on the task list for these. Due date would be August 15. Thanks, Ed Ú Edwin Hart Karen Smith-Yoshimu 05/22/91 10646M action item for you? ======================================================================== Date: Tue, 28 May 91 08:07:06 EDT From: Edwin Hart Subject: Re: Current Draft of Ad Hoc Meeting To: Olle Jarnefors In-Reply-To: Your message of Mon, 27 May 91 20:28:22 +0200 >1. Notice of delivery >===================== > >One of the disadvantages of current email protocols is that you >don't get a notice of delivery. Since I haven't got an answer >from you to my last letter I'm not sure that the communication >works from me to you (it's certainly OK in the opposite >direction). Could you please send me a note that you have >received this letter (by email or to my fax no. +46-8-10 25 10)? > >To be on the safe side I will also send you this letter by fax. I am sorry I did not immediately respond. I did receive your 3 notes. I am reponding to all of them now. I will try to be MUCH more careful about responding quickly to mail. >2. The list of electronic addressees >==================================== > >> "Olle Ja{rnefors" , > >The _international_ form of my name is "Olle Jarnefors". The >original form is "Olle Jrnefors", which >becomes "Olle J{rnefors" when coded with the Swedish variant of >ISO 646, which are used in electronic mail. > >> John Jenkins , Mike Ksar , > >Two names on the same line, Mike might easily be overlooked. > >> Mike Ksar , > >Should be: Takayuki K Sato , I fixed Sato-sans name. (I was lazy and simply copied Mike Ksar's entry and failed to fix the name.) I fixed your entry to J{rnefors. >3. New distribution list >======================== > >> I have asked the University to create a new 10646M electronic distribution >> (BITNET LISTSERVe). As soon as it is available, I will add your names to the >> list. > It is not yet available. When I wrote the note, I thought it was ready but it was not. I sent a polite note to the University asking about it again on Saturday but Monday was a holiday here. Perhaps today? >4. Problem with control characters >================================== > >I found the following unexpected control characters in your >previous email messages. At least one of them is potentially >disastrous: CTRL-Z is treated as end-of-textfile by many MSDOS >programs. > >Control character Should have been Appropriate substitute in email >================= ================ =============================== >CTRL-G (7 '7 "7) Some bullit char * >CTRL-Z (26 '32 "1A) Underline on/off _ > >It is common to indicate underlining or italicizing one or more >worlds _like this_ in email. Bold-face type is often indicated >*like this*. Thanks for this information. In creating the text for E-mail, I reformatted the document for a fixed, 10 characters per inch (25 mm) and did not think about the bullet items. >5. List of participants >======================= > >> Olle J > >Should be: >Olle Jarnefors Royal Institute of Technology, Sweden > >> Karen Smith-Yoshimura Research Lib > >Should be: >Karen Smith-Yoshimura The Research Libraries Group, USA > I just checked these. Apparently, when the was translated somewhere, it truncated the rest of the line for your name. I added "The" to RLG for Karen. Apparently, we had another truncation here in the mail. Did I make the correct changes? >6. C0-C1 restriction >==================== > >> In addition, pending a careful review by computer >> communication, systems, and applications experts, from ISO, >> ECMA, CCITT, and within our enterprises, we believe it >> extremely desirable to allow encoding graphic characters in >> the _C0_ space presently reserved in DIS 10646. This refines >> point 2 from the Canadian proposal. Annex ____ provides more >> details on this particular refinement (the _Bohn_ refinement, >> named for Willy Bohn, who proposed it) of the ECMA proposal. >> Vote Thursday: 16 for/ 0 against/ 3 abstain (Bishop, >> Hasegawa, Sato) > >I would like to suggest two improvements of this point: > >1) Insert "among other parties" after "applications experts, > from". (This is to not exclude important experts like the > Internet Engineering Task Fource (IETF) and enterprises not > represented at the meeting.) The intent was include the appropriate experts with a list of the minimum set of organizations to be considered. > >2) Delete the word "extremely" in "we believe it extremely > desirable to allow". My own notes have no indication of that > word being used at the meeting. The relevant part is: > "Hart: Vote on: Can you live with > o It's desirable to use C0/C1 for graphic characters. > o Code C0/C1 according to Bohn's proposal subject to an > investigation by a group of communication experts. > o Paper for the expert group which describes proposals to use > C0 (Bohn, ECMA, China). > o The expert group shall report on if these propsals are > feasible. > o What experts: Companies, CCITT, ECMA, ISO." > > Since I can accept the statement that it's desirable to allow > C0 for graphical characters but don't think that this is > _extremely_ desirable (Co space is needed only for very rare > idegraphic and Hangul characters which anyway will spill over > into another plane eventually) I would have protested if that > wording had been used for voting in the meeting. I will remove the word "extremely" as you suggested. I have not decided about the other point yet because I thought our intent was . . . careful review by experts (from the computer communications, systems, and applications disciplines within our enterprises and from ISO, ECMA, CCITT, etc.), we believe it desirable . . . >7. Non-spacing marks >==================== > >> The third Code Extension Level >> should specify: >> >> a. In addition to diacritics, non-spacing marks should >> include stress marks, tone marks, and those used for text >> processing operations such as underlining or mathematical >> notation for the name of a vector. >> b. Non-spacing marks should follow the base character for >> consistency. >> c. Imaging and the order of multiple non-spacing diacritics >> should follow well-defined rules. (See Annex ____.) >> d. To allow for compliance with future versions of 10646 >> which may encode additional pre-composed characters, >> allow both encoding a character as a pre-composed >> character or as a base character with one or more non- >> spacing marks. (That is, delete the ECMA statement _if >> the accented letter is already coded as a single >> character, the alternative representation by means of >> floating diacrical marks is not allowed._ > >According to my notes we also decided that "all sequences of >codes should be allowed". (Joe Becker had argued that there is >no practical way to enforce a legislation against certain >sequences of codes.) > >8. Action item 14 >================= > >> 14. (Point 7) Determine the compaction methods to be proposed in >> _Part 4_. (J > >I hope this "J" isn't the start of "Jarnefors"! Can one person >determine this question? I thought it was controversial. I'll change the wording to Coordinate an investigation of the compaction methods . . . I think this is a case where the people from Unicode do not care a lot about any other compaction methods but the one for 2-bytes. Thank your for your clarifying remarks. I found them to be valuable to the document (and some additional e-mail problems I must work around). Ed Hart Ú Edwin Hart Olle Jarnefors 05/28/91*Current Draft of Ad Hoc Meeti ======================================================================== Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2695; Tue, 28 May 91 08:44:47 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 8440; Tue, 28 May 91 08:44:28 EDT Received: from vnet.ibm.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 08:43:57 EDT Received: from RALVMK by vnet.ibm.com (IBM VM SMTP V2R1) with BSMTP id 3601; Tue, 28 May 91 08:42:52 EDT Date: Tue, 28 May 91 08:42:04 EDT From: andersen@ralvmk.vnet.ibm.com To: hart%APLVM.bitnet@cunyvm.cuny.edu Subject: Ad-hoc paper 1) The paper is fine with me 2) Yes 3) Yes Regards, Jerry Ù ANDERSEN RALVMK 5/28/91 Ú andersen@ralvmk.vne hart%APLVM.bitnet@c 5/28/91 Ad-hoc paper ======================================================================== Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2992; Tue, 28 May 91 17:16:37 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 7137; Tue, 28 May 91 17:16:04 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Tue, 28 May 91 17:16:01 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA19955; Tue, 28 May 91 17:14:41 -0400 From: microsoft!asmusf@uunet.uu.net Message-Id: <9105282114.AA19955@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: ibmtoron?schein@uunet.UU.NET, michelsu@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Tue May 28 13:47:14 1991 I am substantially pleased with your minutes and don't think that I have objections other than perhaps minutiae that would prevent it from being distributed to the intended audience. I agree with Isai that speed is important. A. Ù MICROSOF UUNET 5/28/91 Ú microsoft!asmusf@uu uunet!APLVM.BITNET! 5/28/14*Current Draft of Ad Hoc Meeti ======================================================================== Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3028; Tue, 28 May 91 19:16:38 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 6718; Tue, 28 May 91 19:16:19 EDT Received: from uucp-gw-1.pa.dec.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 19:16:16 EDT Received: by uucp-gw-1.pa.dec.com; id AA29087; Tue, 28 May 91 16:16:59 -0700 Received: by mts-gw.pa.dec.com; id AA05251; Tue, 28 May 91 16:04:28 -0700 Message-Id: <9105282304.AA05251@mts-gw.pa.dec.com> Received: from decwet.enet; by decpa.enet; Tue, 28 May 91 16:04:28 PDT Date: Tue, 28 May 91 16:04:28 PDT From: "F. Avery Bishop 28-May-1991 1602" To: hart%aplvm.bitnet@cunyvm.cuny.edu Subject: RE: Your Endorsement and JTC1 mailing Q1: I endorse the statement as written. Q2: I would prefer a formal mailing if it can be done within the bylaws of the relevant bodies. Q3: Either is OK Avery Ù BISHOP DECWET 5/28/91 Ú F. Avery Bishop 28 hart%aplvm.bitnet@c 5/28/91*Your Endorsement and JTC1 mai ======================================================================== Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3101; Wed, 29 May 91 04:21:49 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 4001; Wed, 29 May 91 04:23:32 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Wed, 29 May 91 04:23:30 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA18879; Wed, 29 May 91 04:22:17 -0400 From: microsoft!michelsu@uunet.uu.net Message-Id: <9105290822.AA18879@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: asmusf@uunet.UU.NET, ibmtoron?schein@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Wed May 29 10:15:21 1991 I agree with Asmus and Isai comments. I already sent an answer to Ed but the message bounced. TRying again. Michel Suignard | To: uunet!APLVM.BITNET!HART | Subject: Re: Your Endorsement and JTC1 mailing | Date: Tue May 28 20:41:09 1991 | | | Question 1: So far, I have only received one response to the current draft. | | Please send me E-mail that states either 1) you endorse the statement as | | written or 2) you have concerns and you do not endorse the paper. | | I endorse the statement as stated. I really feel that if we start now | arguing about the sentences we will never stop. I fully agree with Isai | on this topic. | | The only remark I have about the document are the incomplete references to | 2 annexes: | 1) Willy Bohn proposal, | 2) Floating marks. | The annexes should be there and there references set accordingly or I could | also survive with their removal as long as they are not referred to. | | | Question 2: Do you want to distribute the document INFORMALLY as agreed? | | informally or formally through JTC1 (put my name as one of the experts) | | Again I agree with Isai. I don't care about formally or informally but for | sure I want it to be distributed to JTC1/SC2 recipients. | If you want to go with the formal way then I have no problem with your | proposed wording. | | Finally to get a chance to be endorsed by the French body I had to start | circulating the document in its current shape (with a note about possible | change). As long as you change the date on the final document (from May | 23rd's which is the version I have circulated) it will be fine. | | Michel Suignard | Ù MICROSOF UUNET 5/29/91 Ú microsoft!michelsu@ uunet!APLVM.BITNET! 5/29/21*Current Draft of Ad Hoc Meeti ======================================================================== Received: from SEARN.SUNET.SE by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2815; Tue, 28 May 91 12:47:22 EDT Received: from SEARN by SEARN.SUNET.SE (Mailer R2.05) with BSMTP id 1185; Tue, 28 May 91 18:50:42 +0200 Received: from kth.se by SEARN.SUNET.SE (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 18:50:33 +02 Received: by kth.se (5.61+IDA/KTH/LTH/6.0) id AAkth23818; Tue, 28 May 91 18:47:17 +0200 Date: Tue, 28 May 91 18:47:17 +0200 From: Olle Jarnefors Message-Id: <9105281647.AAkth23818@kth.se> To: HART@APLVM.BITNET, ojarnef@admin.kth.se Subject: Re: Current Draft of Ad Hoc Meeting Just some small points: > >2. The list of electronic addressees > >==================================== > > > > "Olle Ja{rnefors" , > > > >The _international_ form of my name is "Olle Jarnefors". > I fixed your entry to J{rnefors. It is better to use the form "Jarnefors". ("J{rnefors" is only understandable for people in the Scandinavian countries and in Germany using national 7-bit codes instead of the REAL Ascii.) > >5. List of participants > >======================= > > Did I make the correct changes? I think so, yes. > >6. C0-C1 restriction > >==================== > > I have not decided about the other point yet because I thought our intent was > > . . . careful review by experts (from the computer communications, systems, > and applications disciplines within our enterprises and from > ISO, ECMA, CCITT, etc.), we believe it desirable . . . That wording is OK. > >7. Non-spacing marks > >==================== > > > >According to my notes we also decided that "all sequences of > >codes should be allowed". (Joe Becker had argued that there is > >no practical way to enforce a legislation against certain > >sequences of codes.) Do you have any comment on this suggested addition? > >> I took the liberty of placing your name on the task list for the task of > >> determining the other compaction methods to be specified in 10646M beyond > >> a 2-octet form for the base multilingual plane and the 4-octet canonical > >> form. I thought we needed some balance by having coordinators outside of > >> North America, especially since the people from Unicode only want a 2-byte > >> and > >> 4-byte compaction method, and several of the European standards bodies have > >> previously stated that they need a 1-byte compaction method. > > I think the people from Unicode do not care if any other compaction > methods are used as long as they have their 2-byte mode and it is the > default. (This is my opinion and may not be true.) > > I believe the question is: What should the other compaction methods be? > Should 1-byte compaction be allowed? > I understand from Mike Ksar that this is very important to many > countries. > Should 3-byte compaction be allowed? > This may have value for the ideographic scripts. Check with C/J/K > countries. > Should compaction mode 5 (mixed number of bytes per character in the > data stream) be allowed? > I think these are the issues. Please review them and make a > recommendation and state the reason for the recommendation. So the "J" in Action Item 14 _did_ mean "Jarnefors" then! Of course I accept this assignment. Shall I prepare the recommendation to the next ad hoc meeting in Geneva and send it to the new distribution list around 5 Aug? I would like to contact those that took part in the discussion on compaction forms. Do you remember which persons did do that? > Thank your for your clarifying remarks. I found them to be valuable to the > document (and some additional e-mail problems I must work around). And thank _you_ for your persistant effort to bring the two sides closer together, for arranging the informal meeting in San Francisco and for producinthe excelent minutes. We seem to almost have found the type of compromise that both SHARE and ITS (Swedish standards body) have asked for. Ù OJARNEF ADMIN 5/28/91 Ú Olle Jarnefors HART@APLVM.BITNET 5/28/91*Current Draft of Ad Hoc Meeti ======================================================================== X-Delivery-Notice: SMTP MAIL FROM does not correspond to sender. Received: from APLVM (SMTP) by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2917; Tue, 28 May 91 14:50:52 EDT Received: from mailer.jhuapl.edu by APLVM.JHUAPL.EDU (IBM VM SMTP R1.2.1) with TCP; Tue, 28 May 91 14:50:50 EDT Received: by mailer.jhuapl.edu (5.57/1.12) id AA06832; Tue, 28 May 91 14:51:30 EDT Received: from zarasun.Metaphor.COM by relay.metaphor.com (4.1/SMI-4.1) id AA02105; Tue, 28 May 91 11:44:33 PDT Received: by zarasun.Metaphor.COM (4.1/SMI-4.0) id AA09690; Tue, 28 May 91 11:45:00 PDT Date: Tue, 28 May 91 11:45:00 PDT From: whistler@zarasun.metaphor.com (Ken Whistler) Message-Id: <9105281845.AA09690@zarasun.Metaphor.COM> To: hart@aplvm.jhuapl.edu Subject: 10646M Minutes Cc: kernaghan@hq.m4.metaphor.com, whistler@zarasun.metaphor.com Ed, I am reviewing the revised minutes right now and will send shortly any suggestions I note. Metaphor concurs with Isai's note that the overriding concern is to distribute the document quickly, as minutes to the meeting. We do not take it as a formal proposal yet, and do not much care exactly what channels the document is distributed in. I have in hand Olle's note, and have to agree with him about the ¬G and ¬Z codes. Don't use them in email which hits the Internet--I had to edit them all out of the document, though fortunately they did not result in any truncations. Note, though, that the ¬? in Olle J¬?rnefors (Ja"rnefors) name DID result in a critical truncation of the document, since it represented an assignment of responsibility to him! --Ken Whistler Ù WHISTLER ZARASUN 5/28/91 Ú Ken Whistler hart@aplvm.jhuapl.e 5/28/91 10646M Minutes ======================================================================== X-Delivery-Notice: SMTP MAIL FROM does not correspond to sender. Received: from APLVM (SMTP) by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3017; Tue, 28 May 91 18:44:22 EDT Received: from mailer.jhuapl.edu by APLVM.JHUAPL.EDU (IBM VM SMTP R1.2.1) with TCP; Tue, 28 May 91 18:44:09 EDT Received: by mailer.jhuapl.edu (5.57/1.12) id AA08723; Tue, 28 May 91 18:44:49 EDT Received: from zarasun.Metaphor.COM by relay.metaphor.com (4.1/SMI-4.1) id AA02322; Tue, 28 May 91 15:36:32 PDT Received: by zarasun.Metaphor.COM (4.1/SMI-4.0) id AA09711; Tue, 28 May 91 15:36:56 PDT Date: Tue, 28 May 91 15:36:56 PDT From: whistler@zarasun.metaphor.com (Ken Whistler) Message-Id: <9105282236.AA09711@zarasun.Metaphor.COM> To: hart@aplvm.jhuapl.edu Subject: 10646M Minutes --Notes Cc: "ma_hasegawa"@jrdv04.enet.dec.com, BL.KSS@rlg.stanford.edu, BOJENS@cphvm1.vnet.ibm.com, DAVIS.MARK@applelink.apple.com, JVANSTEE@stlvm7.vnet.ibm.com, PAECH1@ghqvm1.vnet.ibm.com, SCHEIN@torolab5.vnet.ibm.com, Takayuki_K_Sato%e2@hp8900.desk.hp.com, andersen@alvmk.vnet.ibm.com, becker.OSBU_NORTH@xerox.com, bishop@decwet.enet.dec.com, ecoling@applelink.apple.com, jenkinsj@apple.com, kernaghan@hq.m4.metaphor.com, ksar@hpcea.ce.hp.com, microsoft!asmusf@uunet.uu.net, microsoft!michelsu@uunet.uu.net, ojarnef@admin.kth.se, whistler@zarasun.metaphor.com Ed, Here are my more picky notes on the draft, followed by a couple of more hefty substantive comments on two of the points regarding Areas of Consensus. I concur with Olle's comments re wording of point 6. C0-C1 restriction and 7. Non-spacing marks. Concur with comments circulating re removing the schedule of the next ad hoc meeting in Geneva from paragraph 5 of the Summary. Paragraph 2 of Summary: "favored the Unicode" ==> "favored Unicode". List of Participants: Karen Smith-Yoshimura, The Research Libraries Group Also: "Mr. Stee" ==> "Mr. Van Stee" 3. Problems of code conversion, 2nd. bullet: "dependant" ==> "dependent" 5. "Others advantages" ==> "Other advantages" Areas of Consensus 1., 3rd paragraph, last sentence: "insure" ==> "ensure" ===================== Substantive fixes: Areas of Consensus 4., 2nd paragraph: "...the Canadian proposal to expand Unicode into a 4-octet code." This can be misinterpreted as a proposal to make Unicode a 32-bit code. What is intended is that the numerical values of 16-bit Unicode characters be identical to the numerical values of 32-bit values of 4-octets taken as a single character. (This is consistent with Area of Consensus 10, regarding Wide-Character interpretation.) To be more explicit: U+0041 = 10646M 000/065 LATIN CAPITAL LETTER A = 10646M 000/000/000/065 LATIN CAPITAL LETTER A where the two-octet form of 10646M is the unannounced BMP value, and when placed in a CPU register has the value decimal 65, and where the four-octet form of 10646M is the canonical value, and when placed in a CPU register has the value decimal 65. This is to be constrasted with the current DIS 10646, where the three values would come out to: Unicode U+0041 ==> decimal 65 10646 032/065 ==> decimal 8257 10646 032/032/032/065 ==> decimal 538,976,321 Incidentally, this numerical values problem is not just numerology. Making the ASCII character value = the 16-bit character value = the 32-bit canonical form character value is a MAJOR help to conversion of existing software, and to my mind is the strongest argument by far for agreeing to abandon the C0 restriction. The second strongest has to do with value contiguity, range-checks, and table-size. Only the third level of the argument has to do with the overall coding space-size--and even that one is important! ============= Areas of Consensus 11, NULL Characters in the C language "Unicode uses NULL (000) as the first or second octet of the 2-octet code. The C language uses the NULL (000) octet as a string terminator for the char* data type. Therefore, Unicode cannot be used with the char* data type." As stated, this is flat-out erroneous. First of all, the char* data type is of type POINTER, not of type CHARACTER. Strings in C are implemented with POINTER's to CHARACTER's. In ordinary C libraries, those strings are char*; in C libraries which support the wchar_t CHARACTER type, strings can be implemented as wchar_t*, and those ARE consistent with Unicode. Also, while it is true that the C0 restriction would prevent random 0x00 byte values in 16-bit character data from being interpreted as 8-bit character string terminators if the 16-bit character data was wrongly fed to 8-bit string interfaces, DIS 10646 as it currently stands is no fix, since it also breaks 8-bit C string interfaces: A := "1" (pseudo-code: assign a one-CHARACTER string to some suitable CHARACTER array) B := "2" char* a = A; char* b = B; static charÝ3¨ c; strncpy (c, a, 1); strncat (c, b, 1); If A and B have 10646 data in one-byte compaction form, c has the value "12" If A and B have 10646 data in two-byte compaction form, c has the value " ", i.e. a single SPACE. If A and B have 10646 data in four-byte compaction form, c has a malformed value for half a CHARACTER. What I am getting at is that all C code designed for 8-bit character interfaces has to be rewritten to handle multiple-octet codes. Unicode or DIS 10646 both have this problem. All multiple-byte character encodings have this problem. And ALL computer languages have this problem (Assembly, Cobol, Fortran, Forth, Pascal, Modula, C++, APL, Lisp, Snobol, SmallTalk, Eiffel, Icon, ...) --they are ALL broken if interfaces to handle strings in 8-bit units get handed strings with characters encoded in 16-bit or 32-bit units. They ALL need to be fixed. --Ken Whistler Ù WHISTLER ZARASUN 5/28/91 Ú Ken Whistler hart@aplvm.jhuapl.e 5/28/91 10646M Minutes --Notes ======================================================================== Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2992; Tue, 28 May 91 17:16:37 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 7137; Tue, 28 May 91 17:16:04 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Tue, 28 May 91 17:16:01 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA19955; Tue, 28 May 91 17:14:41 -0400 From: microsoft!asmusf@uunet.uu.net Message-Id: <9105282114.AA19955@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: ibmtoron?schein@uunet.UU.NET, michelsu@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Tue May 28 13:47:14 1991 I am substantially pleased with your minutes and don't think that I have objections other than perhaps minutiae that would prevent it from being distributed to the intended audience. I agree with Isai that speed is important. A. Ù MICROSOF UUNET 5/28/91 Ú microsoft!asmusf@uu uunet!APLVM.BITNET! 5/28/14*Current Draft of Ad Hoc Meeti ======================================================================== Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3101; Wed, 29 May 91 04:21:49 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 4001; Wed, 29 May 91 04:23:32 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Wed, 29 May 91 04:23:30 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA18879; Wed, 29 May 91 04:22:17 -0400 From: microsoft!michelsu@uunet.uu.net Message-Id: <9105290822.AA18879@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: asmusf@uunet.UU.NET, ibmtoron?schein@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Wed May 29 10:15:21 1991 I agree with Asmus and Isai comments. I already sent an answer to Ed but the message bounced. TRying again. Michel Suignard | To: uunet!APLVM.BITNET!HART | Subject: Re: Your Endorsement and JTC1 mailing | Date: Tue May 28 20:41:09 1991 | | | Question 1: So far, I have only received one response to the current draft. | | Please send me E-mail that states either 1) you endorse the statement as | | written or 2) you have concerns and you do not endorse the paper. | | I endorse the statement as stated. I really feel that if we start now | arguing about the sentences we will never stop. I fully agree with Isai | on this topic. | | The only remark I have about the document are the incomplete references to | 2 annexes: | 1) Willy Bohn proposal, | 2) Floating marks. | The annexes should be there and there references set accordingly or I could | also survive with their removal as long as they are not referred to. | | | Question 2: Do you want to distribute the document INFORMALLY as agreed? | | informally or formally through JTC1 (put my name as one of the experts) | | Again I agree with Isai. I don't care about formally or informally but for | sure I want it to be distributed to JTC1/SC2 recipients. | If you want to go with the formal way then I have no problem with your | proposed wording. | | Finally to get a chance to be endorsed by the French body I had to start | circulating the document in its current shape (with a note about possible | change). As long as you change the date on the final document (from May | 23rd's which is the version I have circulated) it will be fine. | | Michel Suignard | Ù MICROSOF UUNET 5/29/91 Ú microsoft!michelsu@ uunet!APLVM.BITNET! 5/29/21*Current Draft of Ad Hoc Meeti ======================================================================== Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2795; Tue, 28 May 91 12:07:32 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 7373; Tue, 28 May 91 12:07:13 EDT Received: from vnet.ibm.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 12:07:11 EDT Received: from TOROLAB5 by vnet.ibm.com (IBM VM SMTP V2R1) with BSMTP id 4756; Tue, 28 May 91 11:46:04 EDT Date: Tue, 28 May 91 09:37:44 EDT From: schein@torolab5.vnet.ibm.com To: HART%APLVM.BITNET@cunyvm.cuny.edu Subject: 10646 voting list Austria, Belgium, Brazil, Canada, China, Czechoslovakia, Denmark, Egypt, France, Germany, Israel, Italy, Japan, Korea, Netherlands, Poland, Sweden, Switzerland, Turkey, UK, USA, USSR, Yugoslavia Plus: Australia, Finland, Hungary, Ireland, Tunisia Isai P.S. In your cover letter make sure that it refers to SC2 activities Ù SCHEIN TOROLAB5 5/28/91 Ú schein@torolab5.vne HART%APLVM.BITNET@c 5/28/91 10646 voting list ======================================================================== Resent-Date: Wed, 29 May 91 11:01:15 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3101; Wed, 29 May 91 04:21:49 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 4001; Wed, 29 May 91 04:23:32 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Wed, 29 May 91 04:23:30 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA18879; Wed, 29 May 91 04:22:17 -0400 From: microsoft!michelsu@uunet.uu.net Message-Id: <9105290822.AA18879@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: asmusf@uunet.UU.NET, ibmtoron?schein@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Wed May 29 10:15:21 1991 ----------------------------Original message---------------------------- I agree with Asmus and Isai comments. I already sent an answer to Ed but the message bounced. TRying again. Michel Suignard | To: uunet!APLVM.BITNET!HART | Subject: Re: Your Endorsement and JTC1 mailing | Date: Tue May 28 20:41:09 1991 | | | Question 1: So far, I have only received one response to the current draft. | | Please send me E-mail that states either 1) you endorse the statement as | | written or 2) you have concerns and you do not endorse the paper. | | I endorse the statement as stated. I really feel that if we start now | arguing about the sentences we will never stop. I fully agree with Isai | on this topic. | | The only remark I have about the document are the incomplete references to | 2 annexes: | 1) Willy Bohn proposal, | 2) Floating marks. | The annexes should be there and there references set accordingly or I could | also survive with their removal as long as they are not referred to. | | | Question 2: Do you want to distribute the document INFORMALLY as agreed? | | informally or formally through JTC1 (put my name as one of the experts) | | Again I agree with Isai. I don't care about formally or informally but for | sure I want it to be distributed to JTC1/SC2 recipients. | If you want to go with the formal way then I have no problem with your | proposed wording. | | Finally to get a chance to be endorsed by the French body I had to start | circulating the document in its current shape (with a note about possible | change). As long as you change the date on the final document (from May | 23rd's which is the version I have circulated) it will be fine. | | Michel Suignard | Ú Edwin Hart Issues with Merging 05/29/91*Current Draft of Ad Hoc Meeti ======================================================================== Resent-Date: Wed, 29 May 91 11:01:30 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3028; Tue, 28 May 91 19:16:38 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 6718; Tue, 28 May 91 19:16:19 EDT Received: from uucp-gw-1.pa.dec.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 19:16:16 EDT Received: by uucp-gw-1.pa.dec.com; id AA29087; Tue, 28 May 91 16:16:59 -0700 Received: by mts-gw.pa.dec.com; id AA05251; Tue, 28 May 91 16:04:28 -0700 Message-Id: <9105282304.AA05251@mts-gw.pa.dec.com> Received: from decwet.enet; by decpa.enet; Tue, 28 May 91 16:04:28 PDT Date: Tue, 28 May 91 16:04:28 PDT From: "F. Avery Bishop 28-May-1991 1602" To: hart%aplvm.bitnet@cunyvm.cuny.edu Subject: RE: Your Endorsement and JTC1 mailing ----------------------------Original message---------------------------- Q1: I endorse the statement as written. Q2: I would prefer a formal mailing if it can be done within the bylaws of the relevant bodies. Q3: Either is OK Avery Ú Edwin Hart Issues with Merging 05/29/91*Your Endorsement and JTC1 mai ======================================================================== Resent-Date: Wed, 29 May 91 11:01:44 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2992; Tue, 28 May 91 17:16:37 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 7137; Tue, 28 May 91 17:16:04 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Tue, 28 May 91 17:16:01 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA19955; Tue, 28 May 91 17:14:41 -0400 From: microsoft!asmusf@uunet.uu.net Message-Id: <9105282114.AA19955@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: ibmtoron?schein@uunet.UU.NET, michelsu@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Tue May 28 13:47:14 1991 ----------------------------Original message---------------------------- I am substantially pleased with your minutes and don't think that I have objections other than perhaps minutiae that would prevent it from being distributed to the intended audience. I agree with Isai that speed is important. A. Ú Edwin Hart Issues with Merging 05/29/91*Current Draft of Ad Hoc Meeti ======================================================================== Resent-Date: Wed, 29 May 91 11:02:16 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> X-Delivery-Notice: SMTP MAIL FROM does not correspond to sender. Received: from APLVM (SMTP) by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2917; Tue, 28 May 91 14:50:52 EDT Received: from mailer.jhuapl.edu by APLVM.JHUAPL.EDU (IBM VM SMTP R1.2.1) with TCP; Tue, 28 May 91 14:50:50 EDT Received: by mailer.jhuapl.edu (5.57/1.12) id AA06832; Tue, 28 May 91 14:51:30 EDT Received: from zarasun.Metaphor.COM by relay.metaphor.com (4.1/SMI-4.1) id AA02105; Tue, 28 May 91 11:44:33 PDT Received: by zarasun.Metaphor.COM (4.1/SMI-4.0) id AA09690; Tue, 28 May 91 11:45:00 PDT Date: Tue, 28 May 91 11:45:00 PDT From: whistler@zarasun.metaphor.com (Ken Whistler) Message-Id: <9105281845.AA09690@zarasun.Metaphor.COM> To: hart@aplvm.jhuapl.edu Subject: 10646M Minutes Cc: kernaghan@hq.m4.metaphor.com, whistler@zarasun.metaphor.com ----------------------------Original message---------------------------- Ed, I am reviewing the revised minutes right now and will send shortly any suggestions I note. Metaphor concurs with Isai's note that the overriding concern is to distribute the document quickly, as minutes to the meeting. We do not take it as a formal proposal yet, and do not much care exactly what channels the document is distributed in. I have in hand Olle's note, and have to agree with him about the ¬G and ¬Z codes. Don't use them in email which hits the Internet--I had to edit them all out of the document, though fortunately they did not result in any truncations. Note, though, that the ¬? in Olle J¬?rnefors (Ja"rnefors) name DID result in a critical truncation of the document, since it represented an assignment of responsibility to him! --Ken Whistler Ú Edwin Hart Issues with Merging 05/29/91 10646M Minutes ======================================================================== Resent-Date: Wed, 29 May 91 11:02:35 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from SEARN.SUNET.SE by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2815; Tue, 28 May 91 12:47:22 EDT Received: from SEARN by SEARN.SUNET.SE (Mailer R2.05) with BSMTP id 1185; Tue, 28 May 91 18:50:42 +0200 Received: from kth.se by SEARN.SUNET.SE (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 18:50:33 +02 Received: by kth.se (5.61+IDA/KTH/LTH/6.0) id AAkth23818; Tue, 28 May 91 18:47:17 +0200 Date: Tue, 28 May 91 18:47:17 +0200 From: Olle Jarnefors Message-Id: <9105281647.AAkth23818@kth.se> To: HART@APLVM.BITNET, ojarnef@admin.kth.se Subject: Re: Current Draft of Ad Hoc Meeting ----------------------------Original message---------------------------- Just some small points: > >2. The list of electronic addressees > >==================================== > > > > "Olle Ja{rnefors" , > > > >The _international_ form of my name is "Olle Jarnefors". > I fixed your entry to J{rnefors. It is better to use the form "Jarnefors". ("J{rnefors" is only understandable for people in the Scandinavian countries and in Germany using national 7-bit codes instead of the REAL Ascii.) > >5. List of participants > >======================= > > Did I make the correct changes? I think so, yes. > >6. C0-C1 restriction > >==================== > > I have not decided about the other point yet because I thought our intent was > > . . . careful review by experts (from the computer communications, systems, > and applications disciplines within our enterprises and from > ISO, ECMA, CCITT, etc.), we believe it desirable . . . That wording is OK. > >7. Non-spacing marks > >==================== > > > >According to my notes we also decided that "all sequences of > >codes should be allowed". (Joe Becker had argued that there is > >no practical way to enforce a legislation against certain > >sequences of codes.) Do you have any comment on this suggested addition? > >> I took the liberty of placing your name on the task list for the task of > >> determining the other compaction methods to be specified in 10646M beyond > >> a 2-octet form for the base multilingual plane and the 4-octet canonical > >> form. I thought we needed some balance by having coordinators outside of > >> North America, especially since the people from Unicode only want a 2-byte > >> and > >> 4-byte compaction method, and several of the European standards bodies have > >> previously stated that they need a 1-byte compaction method. > > I think the people from Unicode do not care if any other compaction > methods are used as long as they have their 2-byte mode and it is the > default. (This is my opinion and may not be true.) > > I believe the question is: What should the other compaction methods be? > Should 1-byte compaction be allowed? > I understand from Mike Ksar that this is very important to many > countries. > Should 3-byte compaction be allowed? > This may have value for the ideographic scripts. Check with C/J/K > countries. > Should compaction mode 5 (mixed number of bytes per character in the > data stream) be allowed? > I think these are the issues. Please review them and make a > recommendation and state the reason for the recommendation. So the "J" in Action Item 14 _did_ mean "Jarnefors" then! Of course I accept this assignment. Shall I prepare the recommendation to the next ad hoc meeting in Geneva and send it to the new distribution list around 5 Aug? I would like to contact those that took part in the discussion on compaction forms. Do you remember which persons did do that? > Thank your for your clarifying remarks. I found them to be valuable to the > document (and some additional e-mail problems I must work around). And thank _you_ for your persistant effort to bring the two sides closer together, for arranging the informal meeting in San Francisco and for producinthe excelent minutes. We seem to almost have found the type of compromise that both SHARE and ITS (Swedish standards body) have asked for. Ú Edwin Hart Issues with Merging 05/29/91*Current Draft of Ad Hoc Meeti ======================================================================== Resent-Date: Wed, 29 May 91 11:02:56 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from MITVMA.MIT.EDU by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3101; Wed, 29 May 91 04:21:49 EDT Received: from MITVMA by MITVMA.MIT.EDU (Mailer R2.05) with BSMTP id 4001; Wed, 29 May 91 04:23:32 EDT Received: from relay1.UU.NET by mitvma.mit.edu (IBM VM SMTP R1.2.1MX) with TCP; Wed, 29 May 91 04:23:30 EDT Received: from microsoft.UUCP (LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA18879; Wed, 29 May 91 04:22:17 -0400 From: microsoft!michelsu@uunet.uu.net Message-Id: <9105290822.AA18879@relay1.UU.NET> To: uunet!APLVM.BITNET!HART@uunet.UU.NET Cc: asmusf@uunet.UU.NET, ibmtoron?schein@uunet.UU.NET, sun!AppleLink.Apple.com!davis.mark@uunet.UU.NET, sun!Xerox.COM!Joseph_D._Becker.osbunorth@uunet.UU.NET, sun!metaphor.com!whistler@uunet.UU.NET Subject: Re: Current Draft of Ad Hoc Meeting Date: Wed May 29 10:15:21 1991 ----------------------------Original message---------------------------- I agree with Asmus and Isai comments. I already sent an answer to Ed but the message bounced. TRying again. Michel Suignard | To: uunet!APLVM.BITNET!HART | Subject: Re: Your Endorsement and JTC1 mailing | Date: Tue May 28 20:41:09 1991 | | | Question 1: So far, I have only received one response to the current draft. | | Please send me E-mail that states either 1) you endorse the statement as | | written or 2) you have concerns and you do not endorse the paper. | | I endorse the statement as stated. I really feel that if we start now | arguing about the sentences we will never stop. I fully agree with Isai | on this topic. | | The only remark I have about the document are the incomplete references to | 2 annexes: | 1) Willy Bohn proposal, | 2) Floating marks. | The annexes should be there and there references set accordingly or I could | also survive with their removal as long as they are not referred to. | | | Question 2: Do you want to distribute the document INFORMALLY as agreed? | | informally or formally through JTC1 (put my name as one of the experts) | | Again I agree with Isai. I don't care about formally or informally but for | sure I want it to be distributed to JTC1/SC2 recipients. | If you want to go with the formal way then I have no problem with your | proposed wording. | | Finally to get a chance to be endorsed by the French body I had to start | circulating the document in its current shape (with a note about possible | change). As long as you change the date on the final document (from May | 23rd's which is the version I have circulated) it will be fine. | | Michel Suignard | Ú Edwin Hart Issues with Merging 05/29/91*Current Draft of Ad Hoc Meeti ======================================================================== Resent-Date: Wed, 29 May 91 11:03:51 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 2695; Tue, 28 May 91 08:44:47 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 8440; Tue, 28 May 91 08:44:28 EDT Received: from vnet.ibm.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Tue, 28 May 91 08:43:57 EDT Received: from RALVMK by vnet.ibm.com (IBM VM SMTP V2R1) with BSMTP id 3601; Tue, 28 May 91 08:42:52 EDT Date: Tue, 28 May 91 08:42:04 EDT From: andersen@ralvmk.vnet.ibm.com To: hart%APLVM.bitnet@cunyvm.cuny.edu Subject: Ad-hoc paper ----------------------------Original message---------------------------- 1) The paper is fine with me 2) Yes 3) Yes Regards, Jerry Ú Edwin Hart Issues with Merging 05/29/91 Ad-hoc paper ======================================================================== Date: Wed, 29 May 91 11:19:28 EDT From: Edwin Hart Subject: Electronic Distribution is Ready To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> The electronic distribution is ready and your name has already been activated. To use it simply send mail to 10646M@JHUVM.BITNET It will then distribute your item to everyone else on the list but you (since you sent the information, it presumes you do not need a copy). Please start sending mail directly to the list instead of to me. Thanks for all of your comments. I will shortly make a decision on what will go into the final version of the paper, mail it out, and upload it. I heard two messages: GET IT OUT SOON, and IF POSSIBLE, USE THE FORMAL ROUTE. Right now, I am thinking of sending it informally to some of the JTC1 member bodies and formally to JTC1 and WG2 for them to distribute to everyone. Best regards, Ed Ú Edwin Hart Issues with Merging 05/29/91 Electronic Distribution is Re ======================================================================== Date: Thu, 30 May 91 10:21:58 EDT From: Edwin Hart Subject: Futures To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> I just wanted to clarify a few of the ideas that I have to be sure we are all thinking on the same track. 0. We all need to continue to be diplomatic and be less sensitive to any less-than-diplomatic criticism from our peers. 1. We are looking for a compromise that *ALL* of us can live with. That means that if we can agree on the major points, we *ALL* need to be flexible on some of the less-major points. In other words, please plan that 10646M will have some characteristics of DIS 10646 and some from Unicode, but do not plan to include every Unicode feature or every 10646 feature into 10646M. Realistically, everyones' pet features cannot be in 10646M where we can reach consensus. 2. Our work needs to be merged back into the ISO WG2 activities starting with the August meeting. After we give the proposal to ISO, it is its decision on what to do with it, what changes to make to DIS 10646, etc. After WG2 decides on the changes, then editing DIS 10646 may begin. I changed the editing action item with this in mind. 3. We are producing a proposal with specific recommendations. It is quite likely that both the Unicode Consortium and WG2 will suggest changes. Although I would hope that the changes are only ones to fine tune the merged standard, that may not be the case. Remember that ISO must evaluate each comment made from around the world (not just ours) and decide what actions to take. 4. I deleted the action item and references to another ad hoc meeting just before the WG2 meeting. 5. Concerning another ad hoc meeting, we need to decide whether to have it or not. To me the decision is whether to complete our current draft proposal and obtain consensus on the remaining issues, etc. or to simply fold that activity into WG2. If we need the time, Mike Ksar has offered to extend the WG2 meeting time to accommodate such a discussion as part of the WG2 meeting. What do you think? If you disagree, be diplomatic. I am not taking a position yet. 6. I change the action item for the June 7 Unicode meeting to add that Unicode needs to issue a statement that they approve of the general direction to merge the two codes (and list any concerns that they may have). This was part of our agreement in San Francisco but not in the action item. 7. I believe that we agreed that the merged code would be a 4-byte code rather than a 2-byte code. We did not discuss the architecture/ structure of the resulting code. 8. We may continue to need some type of 2-byte half plane switching to logically bring the Japanese, Korean, and other planes into the BMP. We did not agree to this but Japanese support of CJK-RLG would seem to require continued support of this feature. Our secretary is typing the mail labels for selective JTC1 members now. I am going to call the JTC1 secretariate in ANSI to try to clear the way for rapid distribution. Ed Ú Edwin Hart Issues with Merging 05/30/91 Futures ======================================================================== Resent-Date: Thu, 30 May 91 15:17:14 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3859; Thu, 30 May 91 13:07:23 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 5506; Thu, 30 May 91 13:07:05 EDT Received: from vnet.ibm.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Thu, 30 May 91 13:07:04 EDT Received: from TOROLAB5 by vnet.ibm.com (IBM VM SMTP V2R1) with BSMTP id 8813; Thu, 30 May 91 13:05:59 EDT Date: Thu, 30 May 91 13:05:50 EDT From: schein@torolab5.vnet.ibm.com To: HART%APLVM.BITNET@cunyvm.cuny.edu Subject: AD-HOC meeting in Geneva ----------------------------Original message---------------------------- Ed, the importance of ad-hoc vs WG2 meeting is in who is going to control it. From experience, I am afraid that meeting controlled by Mr. Ksar will have much less chance to proceed and end in harmony. It will also allow other people (Klaus) easier participation. After we prepare the agreed document in AD-HOC meeting, it will be much more difficult to kill it later. Isai Ú Edwin Hart Issues with Merging 05/30/91 AD-HOC meeting in Geneva ======================================================================== Resent-Date: Thu, 30 May 91 15:27:48 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3185; Wed, 29 May 91 10:35:21 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 1302; Wed, 29 May 91 10:35:00 EDT Received: from vnet.ibm.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Wed, 29 May 91 10:34:59 EDT Received: from TOROLAB5 by vnet.ibm.com (IBM VM SMTP V2R1) with BSMTP id 1258; Wed, 29 May 91 10:18:24 EDT Date: Wed, 29 May 91 10:20:08 EDT From: schein@torolab5.vnet.ibm.com To: HART%APLVM.BITNET@cunyvm.cuny.edu Subject: Address for Belgium ----------------------------Original message---------------------------- I am attaching message from Willy Bohn: ----------------------------------------- 'MSG FROM: PAECH1 --GHQVM1 TO: SCHEIN --TOROLAB5 29.05.91 15:19:07a To: SCHEIN --TOROLAB5 *** Reply to note of 26/05/91 14:13 From: Wilhelm Friedrich Bohn (Willy), +49 711 785-3209 Dep. 3889, Bldg. 7000-01 IBM Deutschland, 7000 Stuttgart 80 Subject: Your Endorsement and JTC1 mailing Isai, thank you for sending me the information. I have no problem with the text as written. If you have an idea what I must do to be able to be reached from the outside please let me know. If you feel that it is necessary please inform Ed Hart of my endorsement of his report. I can agree to both forms of distribution but would prefer the official form. The date of the next meeting can then be communicated to those who want or need to know by other channels. . Auf Wiedersehen and Regards . Willy Bohn, GHQVM1(PAECH1) Ú Edwin Hart Issues with Merging 05/30/91 Address for Belgium ======================================================================== Resent-Date: Thu, 30 May 91 15:28:54 EDT Resent-From: Edwin Hart Resent-To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Received: from CUNYVM.BITNET by APLVM.JHUAPL.EDU (Mailer R2.02A) with BSMTP id 3335; Wed, 29 May 91 12:55:39 EDT Received: from CUNYVM by CUNYVM.BITNET (Mailer R2.07) with BSMTP id 7339; Wed, 29 May 91 12:55:07 EDT Received: from relay.hp.com by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with TCP; Wed, 29 May 91 12:55:00 EDT Received: from hpcea.ce.hp.com by relay.hp.com with SMTP (16.5/15.5+IOS 3.13) id AA10802; Wed, 29 May 91 09:55:46 -0700 Received: by hpcea.ce.hp.com (15.11/15.5+IOS 3.22) id AA00917; Wed, 29 May 91 09:58:37 pdt From: Mike Ksar Message-Id: <9105291658.AA00917@hpcea.ce.hp.com> To: HART%APLVM.BITNET@CUNYVM.CUNY.EDU (Edwin Hart) Date: Wed, 29 May 91 9:58:35 PDT Subject: Re: draft cover letter Cc: ksar@hpcea.ce.hp.com In-Reply-To: Message from "Edwin Hart" of May 29, 91 at 12:35 (noon) X-Mailer: Elm Ýversion 1.5¨ ----------------------------Original message---------------------------- Hello Ed, I have a few comments on your draft letter to JTC1. 1. Before you send it I recommend that you clear it with JTC1 Secretariat, ANSI (NY). The contact name is Fran Schrotter. It is still up to you to send it, but I think her support to you will be invaluable. 2. When you talk about the informal meeting, it is important to preface that paragraph with the fact that it was held outside WG2 and that WG2 did not take any decisions to affect the structure of DIS 10646. Right now you end the paragraph that it was not a WG2 meeting. Let me know what the result of your contacts with JTC1 Secretariat are? Best regards Mike > > > Johns Hopkins University > Applied Physics Laborato > Laurel, MD 20723-6099 > USA > 28 May, 1991 > > > > > To: Members of ISO-IEC JTC1 > From: Edwin Hart, USA > Subject: Personal Contribution on DIS 10646: Merging 10646 and > Unicode > > > Recently, we held an informal discussion between proponents > of ISO-IEC DIS 10646 (from JTC1/SC2/WG2) and Unicode (from > the Unicode Consortium) for the purpose of merging the two > incompatible codes into one code. We achieved a > breakthrough because the diverse group was able to achieve > consensus on a number of issues that divided 10646 and > Unicode. Although several issues remain to be resolved, it > is appropriate to share this good news with you and ask for > your support of this effort by communicating it to the > members of your national standards body. > > > A number of information users and developers are concerned about > the real possibility that we will need to support two incompatible > multi-octet codes, ISO 10646 and Unicode. Some may say quite > correctly that Unicode is not a standard and therefore deserves > neither support nor recognition. However, we live in an imperfect > world where regardless of whether Unicode is an international > standard or not, many of us will be forced to support it unless we > do something soon. For the reasons stated in the enclosed > document, I believe that the world is too small to have two > incompatible multi-octet codes with the same goal. I also believe > that both DIS 10646 and Unicode complement each other and have > features valuable to a multi-octet code. Therefore, an > international standard that merges the best features of DIS 10646 > and Unicode makes good sense to me, and I hope to you also. That > is my goal. > > In May, the 10646 Working Group, JTC1/SC2/WG2, met in San > Francisco, California, USA. This appeared to be the perfect time > and place to hold a discussion between the 10646 proponents and the > Unicode proponents. The results of such discussions could be > extremely useful in resolving issues if the DIS 10646 should fail > to obtain a majority of the ballots. Although we wanted to hold > these discussions at the WG2 meeting, JTC1 rules prevented > discussing any changes to DIS 10646 while it was out for ballot. > we did not discuss it there. Rather, after the WG2 meeting ended, > several of us met informally to discuss merging the two codes into > one. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The above paragraph could be modified per my note 2 above. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > I believe that we achieved a breakthrough because we were able > to achieve a consensus on several issues that divided 10646 and > Unicode. This was particularly encouraging because the > participants presented a diverse industry cross-section. We came > from eight countries, over a dozen (12) different enterprises, > included both product developers and users, and represented both > the 10646 and Unicode codes. If it was a breakthrough that we had > the discussions, it was a miracle to achieve consensus among such > a diverse group. The initial results are enclosed for you to read > and reach your own conclusions. > > While encouraging as a first step, the proposal needs additional > work. When the proposal to merge DIS 10646 and Unicode is completed, > I will submit it to JTC1/SC2/WG2 and JTC1/SC2 for consideration. > Meanwhile, I am making the draft available for your consideration, > your comments, and if you think it appropriate, your support. > > Thank you for your consideration. > > > Sincerely, > > > > Edwin Hart > Ú Edwin Hart Issues with Merging 05/30/91*draft cover letter ======================================================================== Date: Thu, 30 May 91 15:50:35 EDT From: Edwin Hart Subject: Final Cover Letter To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Enclosed is the "final" cover letter. It is being reproduced now. ________________________________________________________________________ Johns Hopkins University Applied Physics Laborato Laurel, MD 20723-6099 USA 30 May, 1991 To: Members of ISO-IEC JTC1 From: Edwin Hart, USA Subject: Personal Contribution on DIS 10646: Merging 10646 and Unicode Recently, we held an informal discussion between proponents of ISO-IEC DIS 10646 (from JTC1/SC2/WG2) and Unicode (from the Unicode Consortium) for the purpose of exploring the possibility of merging the two incompatible codes into one code. We achieved a breakthrough because the diverse group achieved consensus on several issues that divided 10646 and Unicode. Although several issues remain to be resolved, and our proposal needs to be accepted by the formal organizations involved, it is appropriate to share this good news with you. We also ask for your support of this effort by communicating it to the members of your national standards body and commenting on it in your ballot on DIS 10646. Many information users and developers are concerned about the real possibility that we will need to support two incompatible multi-octet codes, ISO 10646 and Unicode. Some may say that Unicode is not an international standard and therefore deserves neither support nor recognition. However, we live in an imperfect world where regardless of whether Unicode is an international standard or not, many of us will have to choose to support it unless we do something soon. For the reasons stated in the enclosed document, I believe that the world is too small to have two incompatible multi-octet codes with the same goal. I also believe that both DIS 10646 and Unicode complement each other and have features valuable to a multi-octet code. Therefore, an international standard that merges the best features of DIS 10646 and Unicode makes good sense to me, and I hope to you also. That is my goal. In May, the 10646 Working Group, JTC1/SC2/WG2, met in San Francisco, California, USA. This appeared to be the perfect time and place to hold a discussion between the 10646 proponents and the Unicode proponents. The results of such discussions could be extremely useful in resolving issues if the DIS 10646 should fail to obtain a majority of the ballots. Although we wanted to hold these discussions at the WG2 meeting, JTC1 rules prevented discussing any changes to DIS 10646 while it was out for ballot. Accordingly, we did not discuss any changes to DIS 10646 at the WG2 meeting. Rather, after the meeting ended, we met informally to discuss merging the two codes into one. I believe that we achieved a breakthrough because we achieved consensus on several issues that divided 10646 and Unicode. This was particularly encouraging because the participants presented a diverse industry cross-section. We came from eight countries, over a dozen (12) different enterprises, included both product developers and users, and represented both the 10646 and Unicode codes. If it was a breakthrough that we had the discussions, it was a miracle to get consensus among such a diverse group. The initial results are enclosed for you to read and reach your own conclusions. While encouraging as a first step, the proposal needs additional work. When the proposal to merge DIS 10646 and Unicode is completed, I will submit it to JTC1/SC2/WG2 and JTC1/SC2 for consideration. Meanwhile, I am making the draft available for your consideration, your comments, and if you think it appropriate, your support. Thank you for your consideration. Sincerely, Edwin Hart Ú Edwin Hart Issues with Merging 05/30/91 Final Cover Letter ======================================================================== Date: Thu, 30 May 91 15:52:53 EDT From: Edwin Hart Subject: Final Version of Draft Document, Part 1 To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> This is part 1 of the final DRAFT document. It is being reproduced now. Thanks for all of you input. If you do not like the final result, yell at me. Ed _______________________________________________________________________ Document: 10646M/91-01 Date: 30 May, 1991 Subject: Summary of Results of Informal Meeting to Discuss Merging of DIS 10646 and Unicode into One Code From: Edwin Hart, Moderator 10646M (Merger) Ad Hoc Group Reply to: Edwin Hart Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 Electronic Mail: HART@APLVM.BITNET or HART@APLVM.JHUAPL.EDU Voice: +1 (301) 953-6926 Facsimile: +1 (301) 953-1093 This document represents the first draft of what we hope will become a proposal to merge DIS 10646 and Unicode into one code. The primary advantage of this proposal is that it is built on consensus of people supporting ISO 10646 and others supporting Unicode. We plan to submit a final consensus document to WG2 for consideration at the WG2 editing meeting planned for August, 1991 in Geneva, Switzerland. At that time, we plan to work within WG2 to refine the 10646 standard. Summary We affirm our strong support of the effort by ISO-IEC JTC1/SC2/WG2 to develop 10646. We believe that ISO with its open and responsive procedures will give careful consideration to our proposal to refine the DIS 10646. In addition, we believe that the Unicode Consortium has provided valuable insight and technical solutions to newer requirements. We also believe that having a single international standard that incorporates the best features of DIS 10646 and Unicode as outlined in this proposal is far superior to having two incompatible standards with same goal. Therefore, after the completion of the May, 1991 ISO-IEC JTC1/SC2/WG2 meeting in San Francisco, California in the USA, the delegates attended an informal meeting. At the meeting, we discussed requirements to merge ISO-IEC DIS 10646 and Unicode. The people attending the informal meeting included some who favored the ISO 10646 code and others who favored Unicode. We believed that achieving consensus among these people would lead to a merger proposal more likely to be supported by ISO-IEC JTC1/SC2 and the Unicode Consortium. In view of the diverse views represented at the meeting, the results are surprisingly positive. We succeeded in reaching a consensus on major design issues that had previously separated the DIS 10646 and Unicode codes and made them incompatible. We believe that this proposal paves the way for a merger of the best features of DIS 10646 and Unicode into one multi-octet code standard. Yet, this is merely a first step; further work and consensus are required to produce a final proposal. In summary, although ISO and the Unicode Consortium have not yet endorsed this proposal, it is promising because it was the result of a consensus of many people who represented both the ISO 10646 and Unicode Consortium efforts. However, our work would have been almost impossible had it not been preceded by the excellent proposals submitted to WG2 by ECMA, Canada and China. To form our consensus, we used these proposals and new information on the Chinese, Japanese and Korean Joint Research Group (CJK-JRG) announced at the WG2 meeting in San Francisco. We believe this new proposal is very promising and those attending agreed to work to build support for it within their respective companies, and national and industry standard bodies, including ECMA and the Unicode Consortium. General Objectives We adopted the following objectives for the group: 1. Create a proposal to merge the best features of DIS 10646 and Unicode such that the proposal is acceptable to both ISO and the Unicode Consortium. 2. Increase cooperation between ISO-IEC JTC1/SC2 and the Unicode Consortium. 3. Define action items and the timing to complete them. Participants Except for Mr. Jenkins, the following people participated in the Wednesday afternoon discussions: Jerry Andersen IBM, USA Lloyd Anderson Ecological Linguistics, USA Joseph Becker Xerox, USA F. Avery Bishop Digital, USA Willy Bohn University of Hanover, Germany Mark Davis Apple, USA Asmus Freytag Microsoft, USA Joachim Friemelt Siemens, Germany Edwin Hart SHARE Inc./Johns Hopkins University, USA Masami Hasegawa Digital Japan Huang, Weimin CESI, China Olle Jarnefors Royal Institute of Technology, Sweden John Jenkins Apple, USA Bo Jensen IBM Denmark Mike Ksar HP, USA Takayuki Sato HP Japan Isai Scheinberg IBM Canada Karen Smith-Yoshimura The Research Libraries Group, USA Michel Suignard Microsoft, France J. G. Van Stee IBM, USA Kenneth Whistler Metaphor, USA Zhang, Zhoucai CCID, China On Thursday, Mr. Jenkins joined the group but Mr. Stee and Mr. Whistler were absent. In addition, Mr. Jenkins left before voting, and Mr. Hasegawa, Mr. Ksar, and Mr. Bohn were unable to stay for all the votes. On Friday, except for Mr. Friemelt (who had to leave before we concluded the meeting), the following participated in the voting: Mr. Anderson, Mr. Bishop, Mr. Bohn, Mr. Freytag, Mr. Friemelt, Mr. Hart, Mr. Hasegawa, Mr. Jenkins, Mr. Sato, Mr. Scheinberg, and Mr. Suignard. Advantages of Having Only One Multi-Octet Code Standard Here is a list of advantages to having one global multi-octet code standard: 1. Why should we be concerned about two standards? a. Inevitable requirement to support both i. 10646 because it is an international standard ii. Unicode for compatibility with Unicode-based products b. Cost of supporting both i. The cost to do both is probably very large ii. Must consider the costs to convert between the two c. Erosion of single code standard mind-set i. If two, why not three? four? ten? d. Diminishes advantages of either alone without the other i. Single code standard solves many problems that would not be solved if we have two or more of them ii. May introduce the requirement to switch between the two 2. The importance of de-jure standards a. Increasingly used as procurement requirements i. Gives customer more options for interconnection of products from different vendors b. Integral part of vast, interlocking family of standards, each assuming the others c. Better acceptance, because every country can participate i. Not perceived as dominated by U.S. 3. Problems of code conversion a. Must identify both the source and the target code, but often no way to do this b. Conversion is application/subsystem dependent, and it often cannot be confined to one place (that is, it is much more expensive) c. Solving same problem in several places introduces probability of getting some solutions out of synchronization with others d. An uncontrollable, moving target (that is, you never own more than one of the two codes, you cannot control repertoires, etc.) e. Complicated by repertoire differences f. No right way to manage the differences i. Mismatch can range from minor irritation to catastrophe g. Further complicated by differences in character semantics i. No tested solution is known ii. At best, makes translation even more difficult 4. The Costs of code conversion a. Monetary cost of developing, testing, maintaining, etc. b. Diversion of human and other resources by developers c. Performance and memory penalties (extra overhead) d. Errors and other problems are inevitable e. Customer dissatisfaction f. Customer conversion requirements will divert resources for creating local solutions g. Forces tradeoffs between satisfying installed base and meeting new market requirements 5. Other advantages a. One reference source for the code Areas of Consensus 1. Remove the C0and C1 restrictions. We support the ECMA proposal, point 1, To remove the restriction on the so-called C1 space. This point is also included in the Canadian proposal, and other national body positions on DIS 10646 including the ones from China and the US. Vote Thursday: 17 for/ 0 against/ 2 abstain (Davis, Freytag) In addition, pending a careful review by computer communication, systems, and applications experts, from ISO, ECMA, CCITT, and within our enterprises, we believe it desirable to allow encoding graphic characters in the C0 space presently reserved in DIS 10646. This refines point 2 from the Canadian proposal. Annex ____ provides more details on this refinement (the Bohn refinement, named for Willy Bohn, who proposed it) of the ECMA proposal. Vote Thursday: 16 for/ 0 against/ 3 abstain (Bishop, Hasegawa, Sato) Removing the C0 restriction in addition to removing the C1 restriction will provide for flexibility by allowing the encoding of more characters in the base multilingual plane that is the most important 2-octet plane for interchange and interworking. A consequence of removing the C0 restriction is that 10646 must change the way 1-octet control characters are encoded by placing the 1-octet control character into the least significant octet of the current compaction method and padding the most significant octets to the width of the current compaction method. In addition, the 1-octet compaction method must be adjusted to ensure that the control characters are correctly handled. Ú Edwin Hart Issues with Merging 05/30/91 Final Version of Draft Docume ======================================================================== Date: Thu, 30 May 91 15:56:06 EDT From: Edwin Hart Subject: Final Version of Draft Document, Part 2 To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Part 2 ________________________________________________________________________ 2. Create an International Repertoire of Unified Chinese, Japanese, and Korean Ideographs and Encode This Set of Ideographs into the Base Multilingual Plane. We propose a refinement to point 5 of the Canadian proposal. We believe that coding an international repertoire of unified Chinese, Japanese, and Korean ideographs in the base multilingual plane is mandatory for international interworking and processing efficiency. The encoding of the international C/J/K repertoire must be completed by the end of 1991. We propose to use the CJK-JRG results if it is available in 1991; otherwise we propose to use the best information available at that time. Vote Thursday: 17 for/ 0 against/ 1 abstain (Ksar), 1 absent (Hasegawa) Recent statements by the Japanese delegates to WG2 showed their strong support for the CJK-JRG. From this information, the group concluded that the unification of Chinese, Japanese, and Korean ideographs so highly desired by the international community is feasible. Providing that WG2 continues to recognize the stated Japanese requirement to encode its characters in its own 10646 plane, Japan recognized the need for an international repertoire of Chinese, Japanese, and Korean ideographs. A meeting of the CJK-JRG has been called (Tokyo, July, 1991) to start creating an international repertoire and ordering. 3. Allow the Option to Use Non-Spacing Marks. Pending careful review by ISO TC46 and CCITT, we propose to refine point iv) 2) of the ECMA proposal for floating diacritical marks as follows: The third Code Extension Level should specify: a. In addition to diacritics, non-spacing marks should include stress marks, tone marks, and those used for text processing operations such as underlining or mathematical notation for the name of a vector. b. Non-spacing marks should follow the base character for consistency. c. Imaging and the order of multiple non-spacing diacritics should follow well-defined rules. (See Annex ____.) d. To allow for compliance with future versions of 10646 that may encode additional pre-composed characters, allow both encoding a character as a pre-composed character or as a base character with one or more non- spacing marks. (That is, delete the ECMA statement if the accented letter is already coded as a single character, the alternative representation by means of floating diacritical marks is not allowed.) This assumes that future revisions of 10646 will take certain characters that used floating marks in the current version of 10646 and encode them as pre- composed characters. e. All sequences of codes should be allowed because of the difficulty of enforcing a legislation against certain sequences of code positions. Vote Thursday: 16 for/ 0 against/ 1 abstain (Sato)/ absent (Bohn, Hasegawa, Ksar) 4. Define the merger (10646M) of DIS 10646 and Unicode as a 4- octet code. Vote Thursday: 16 for/ 0 against/ 0 abstain/ absent (Hasegawa, Ksar, Bohn) We support the 4-octet definition of the merger of DIS 10646 and Unicode. Using 4-octets allows the flexibility needed to expand the code repertoire to meet all foreseeable requirements. 5. Location of Space for Presentation Forms We would support a drastic reduction or elimination of the presentation forms in the base multilingual plane while retaining codes necessary to transcode existing standards in plain text. People were concerned that DIS 10646 reserved too much unused code space in the base multilingual plane. A final determination of the presentation codes will be made in consultation with Arabic and other experts. Vote Thursday: 15 for/ 0 against/ 1 abstain (Becker) 6. Combine the Repertoires of DIS 10646 and Unicode into the Merged Code. We propose that the repertoire of the base multilingual plane of the merged code, 10646M, be derived from a superset composed of the union of the repertoires of DIS 10646 and Unicode; for example, the superset should include pre- composed Latin, Greek, Hangul, Vietnamese, and additional symbols. Vote Friday: 10 for/ 0 against/ 0 abstain 7. Simplify the Compaction Methods. We are concerned about the complexity of the DIS 10646 compaction forms. For simplicity, we propose that there be several parts to the standard: Part 1: General introduction, terminology, etc. Part 2: The base multilingual plane (BMP). This part of the standard will specify the 2-octet implementation of the BMP. Other parts are not required for conforming implementations of the BMP. This part may be implemented without announcers. Part 3: The full four-octet canonical form. Part 4: Mechanisms for other compaction methods to be determined. In the absence of other introducers for 10646 data, Part 2 should be assumed. Vote Friday: 10 for/ 0 against/ 0 abstain 8. Make Annex H Part of the 10646 Conformance statement. We recommend moving Annex H of DIS 10646 into the main body of the standard and making it a requirement for conformance. Vote Friday: 9 for/ 0 against/ 0 abstain/ 1 absent (Bohn) Due to time limitations we were unable to discuss and make recommendations to resolve the following differences between DIS 10646 and Unicode. 9. Coding of Semantics versus Shape. For example, parenthesis, brackets and braces are coded as open/close in Unicode, and as left/right in DIS 10646. 10. Using Any Multi-Octet Coded-Character-Set Will Require Program Changes. The following two examples show that neither DIS 10646 nor Unicode may be blindly used with the C programming language. a. C Language Wide-Character (wchar_t) Model Padding ISO 8859/1 characters with the decimal 032 value precludes the direct use (without conversion) of 10646 compaction forms 2-4 as the wchar_t data type in the C programming language. This is point 3 in the Canadian position statement. b. NULL Characters in the C Language Unicode may use 000 as the first or second octet of the 2- octet code. The C language uses the NULL (000) octet as a character string terminator for 1-octet character data. Therefore, C programs must be rewritten to use Unicode. 11. Other Issues The above list of differences between Unicode and DIS 10646 is not exhaustive. Other lower priority issues also need to be considered. Action Items to Promote the Agreement 1. Participants will lobby for this proposal with their country and company constituencies. (All, immediately) 2. Ask the Unicode Consortium member companies to place a discussion of this document on the agenda of the next Unicode Consortium meeting on June 7. The Unicode Consortium should formally state that it agrees or disagrees with the general direction and state any of its concerns with specific points. (Whistler) 3. Form a joint editing committee to help draft the final 10646 merged standard. (Freytag provides updated code tables, Hasegawa provides updated structure and text, 15 Aug. list the areas of the DIS 10646 document that would require changes) 4. For closer cooperation between ISO and the Unicode Consortium, we encourage the Unicode Consortium to pursue becoming a liaison member of JTC1/SC2, and for JTC1/SC2 to accept the Consortium as a liaison member. (Unicode Consortium, Aug., 1991) 5. Send this report to the national bodies and ask them to consider our consensus agreement in their votes on ISO-IEC DIS 10646. (Hart, 29 May) 6. Provide a list of the advantages of having one multi-octet code rather than two. (Andersen, done) 7. (Point 1) Coordinate an investigation of the impact of coding in C0. (Scheinberg, 15 Aug.) 8. (Point 2) Using formal minutes and other information, summarize the Tokyo CJK-JRG meeting. (Collins, 31 July) 9. (Point 3) Provide the Annex describing the rules to be used with multiple non-spacing marks. (Whistler, 9 June) 10. (Point 3) Coordinate review by ISO TC46 and CCITT of proposed use of non-spacing marks. (Smith-Yoshimura (TC46) and Friemelt (CCITT), Aug. 15) 11. (Point 5) Coordinate a review of the need to reserve so large an area for presentation forms for Arabic and other scripts on the base multilingual plane. (Ksar and Friemelt, 15 Aug.). 12. (Point 6) Investigate need for composed characters from Cyrillic and Polytonic Greek. (Why did WG2 include them in the DIS?) (Whistler, 15 Aug.) 13. (Point 7) Coordinate an investigation of which compaction methods to propose in Part 4. (Jarnefors, 15 Aug.) 14. Create 10646M electronic distribution list. Send electronic mail message to Hart to subscribe. (Hart, done) (End of Document) Ú Edwin Hart Issues with Merging 05/30/91 Final Version of Draft Docume ======================================================================== Date: Fri, 31 May 91 08:42:04 EDT From: Edwin Hart Subject: Draft Proposal to Merge DIS 10646 and Unicode To: Joan Winters , Brian Eliot , Klaus Daube , Denis Garneau , Kurt Neuenschwander , Alain LaBonte' , Marty Marchyshyn , "Thomas Steel, Jr." , Iain Stinson , Lee Varian , Johan van Wingen , Bernard Chombart Ladies and Gentlemen, Enclosed is the cover letter and a draft proposal to merge DIS 10646 and Unicode. This is for your information and help. The letter will be sent to JTC1 for distribution to its members (national standards bodies). Due to the delays of using the official channels, I would appreciate it if you would distribute this to your national standards body. Note that I will handle the US, Isai Scheinberg will handle Canada, and Michel Suignard has already given France (AFNOR) a copy of the draft document. The voting deadline for DIS 10646 is June 6 so we have little time. At this time, we are planning another informal meeting in Geneva, Switzerland from 19 August at 10:00 to 21 August at noon. This is just before the ISO 10646 Working Group meeting (Planned for 21 August at ?13:00? until 27 August). We invite people with a sincere desire to merge DIS 10646 and Unicode into one multibyte code. If you or someone else is interested in participating, please have them inform me by June 28 so that we can ensure that we have appropriate meeting facilities. In July, I will provide final details. Please reconfirm with me in July since events may change the need for this meeting or the dates. Best regards, Ed ___________________________________________________________________ Johns Hopkins University Applied Physics Laborato Laurel, MD 20723-6099 USA 30 May, 1991 To: Members of ISO-IEC JTC1 From: Edwin Hart, USA Subject: Personal Contribution on DIS 10646: Merging 10646 and Unicode Recently, we held an informal discussion between proponents of ISO-IEC DIS 10646 (from JTC1/SC2/WG2) and Unicode (from THE UNICODE CONSORTIUM) FOR THE PURPOSE OF EXPLORING THE possibility of merging the two incompatible codes into one code. We achieved a breakthrough because the diverse group achieved consensus on several issues that divided 10646 and Unicode. Although several issues remain to be resolved, and our proposal needs to be accepted by the formal organizations involved, it is appropriate to share this good news with you. We also ask for your support of this effort by communicating it to the members of your national standards body and commenting on it in your ballot on DIS 10646. Many information users and developers are concerned about the real possibility that we will need to support two incompatible multi-octet codes, ISO 10646 and Unicode. Some may say that Unicode is not an international standard and therefore deserves neither support nor recognition. However, we live in an imperfect world where regardless of whether Unicode is an international standard or not, many of us will have to choose to support it unless we do something soon. For the reasons stated in the enclosed document, I believe that the world is too small to have two incompatible multi-octet codes with the same goal. I also believe that both DIS 10646 and Unicode complement each other and have features valuable to a multi-octet code. Therefore, an international standard that merges the best features of DIS 10646 and Unicode makes good sense to me, and I hope to you also. That is my goal. In May, the 10646 Working Group, JTC1/SC2/WG2, met in San Francisco, California, USA. This appeared to be the perfect time and place to hold a discussion between the 10646 proponents and the Unicode proponents. The results of such discussions could be extremely useful in resolving issues if the DIS 10646 should fail to obtain a majority of the ballots. Although we wanted to hold these discussions at the WG2 meeting, JTC1 rules prevented discussing any changes to DIS 10646 while it was out for ballot. Accordingly, we did not discuss any changes to DIS 10646 at the WG2 meeting. Rather, after the meeting ended, we met informally to discuss merging the two codes into one. I believe that we achieved a breakthrough because we achieved consensus on several issues that divided 10646 and Unicode. This was particularly encouraging because the participants presented a diverse industry cross-section. We came from eight countries, over a dozen (12) different enterprises, included both product developers and users, and represented both the 10646 and Unicode codes. If it was a breakthrough that we had the discussions, it was a miracle to get consensus among such a diverse group. The initial results are enclosed for you to read and reach your own conclusions. While encouraging as a first step, the proposal needs additional work. When the proposal to merge DIS 10646 and Unicode is completed, I will submit it to JTC1/SC2/WG2 and JTC1/SC2 for consideration. Meanwhile, I am making the draft available for your consideration, your comments, and if you think it appropriate, your support. Thank you for your consideration. Sincerely, Edwin Hart ___________________________________________________________________ Document: 10646M/91-01 Date: 30 May, 1991 Subject: Summary of Results of Informal Meeting to Discuss Merging of DIS 10646 and Unicode into One Code From: Edwin Hart, Moderator 10646M (Merger) Ad Hoc Group Reply to: Edwin Hart Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 Electronic Mail: HART@APLVM.BITNET or HART@APLVM.JHUAPL.EDU Voice: +1 (301) 953-6926 Facsimile: +1 (301) 953-1093 This document represents the first draft of what we hope will become a proposal to merge DIS 10646 and Unicode into one code. The primary advantage of this proposal is that it is built on consensus of people supporting ISO 10646 and others supporting Unicode. We plan to submit a final consensus document to WG2 for consideration at the WG2 editing meeting planned for August, 1991 in Geneva, Switzerland. At that time, we plan to work within WG2 to refine the 10646 standard. Summary We affirm our strong support of the effort by ISO-IEC JTC1/SC2/WG2 to develop 10646. We believe that ISO with its open and responsive procedures will give careful consideration to our proposal to refine the DIS 10646. In addition, we believe that the Unicode Consortium has provided valuable insight and technical solutions to newer requirements. We also believe that having a single international standard that incorporates the best features of DIS 10646 and Unicode as outlined in this proposal is far superior to having two incompatible standards with same goal. Therefore, after the completion of the May, 1991 ISO-IEC JTC1/SC2/WG2 meeting in San Francisco, California in the USA, the delegates attended an informal meeting. At the meeting, we discussed requirements to merge ISO-IEC DIS 10646 and Unicode. The people attending the informal meeting included some who favored the ISO 10646 code and others who favored Unicode. We believed that achieving consensus among these people would lead to a merger proposal more likely to be supported by ISO-IEC JTC1/SC2 and the Unicode Consortium. In view of the diverse views represented at the meeting, the results are surprisingly positive. We succeeded in reaching a consensus on major design issues that had previously separated the DIS 10646 and Unicode codes and made them incompatible. We believe that this proposal paves the way for a merger of the best features of DIS 10646 and Unicode into one multi-octet code standard. Yet, this is merely a first step; further work and consensus are required to produce a final proposal. In summary, although ISO and the Unicode Consortium have not yet endorsed this proposal, it is promising because it was the result of a consensus of many people who represented both the ISO 10646 and Unicode Consortium efforts. However, our work would have been almost impossible had it not been preceded by the excellent proposals submitted to WG2 by ECMA, Canada and China. To form our consensus, we used these proposals and new information on the Chinese, Japanese and Korean Joint Research Group (CJK-JRG) announced at the WG2 meeting in San Francisco. We believe this new proposal is very promising and those attending agreed to work to build support for it within their respective companies, and national and industry standard bodies, including ECMA and the Unicode Consortium. General Objectives We adopted the following objectives for the group: 1. Create a proposal to merge the best features of DIS 10646 and Unicode such that the proposal is acceptable to both ISO and the Unicode Consortium. 2. Increase cooperation between ISO-IEC JTC1/SC2 and the Unicode Consortium. 3. Define action items and the timing to complete them. Participants Except for Mr. Jenkins, the following people participated in the Wednesday afternoon discussions: Jerry Andersen IBM, USA Lloyd Anderson Ecological Linguistics, USA Joseph Becker Xerox, USA F. Avery Bishop Digital, USA Willy Bohn University of Hanover, Germany Mark Davis Apple, USA Asmus Freytag Microsoft, USA Joachim Friemelt Siemens, Germany Edwin Hart SHARE Inc./Johns Hopkins University, USA Masami Hasegawa Digital Japan Huang, Weimin CESI, China Olle Jarnefors Royal Institute of Technology, Sweden John Jenkins Apple, USA Bo Jensen IBM Denmark Mike Ksar HP, USA Takayuki Sato HP Japan Isai Scheinberg IBM Canada Karen Smith-Yoshimura The Research Libraries Group, USA Michel Suignard Microsoft, France J. G. Van Stee IBM, USA Kenneth Whistler Metaphor, USA Zhang, Zhoucai CCID, China On Thursday, Mr. Jenkins joined the group but Mr. Stee and Mr. Whistler were absent. In addition, Mr. Jenkins left before voting, and Mr. Hasegawa, Mr. Ksar, and Mr. Bohn were unable to stay for all the votes. On Friday, except for Mr. Friemelt (who had to leave before we concluded the meeting), the following participated in the voting: Mr. Anderson, Mr. Bishop, Mr. Bohn, Mr. Freytag, Mr. Friemelt, Mr. Hart, Mr. Hasegawa, Mr. Jenkins, Mr. Sato, Mr. Scheinberg, and Mr. Suignard. Advantages of Having Only One Multi-Octet Code Standard Here is a list of advantages to having one global multi-octet code standard: 1. Why should we be concerned about two standards? a. Inevitable requirement to support both i. 10646 because it is an international standard ii. Unicode for compatibility with Unicode-based products b. Cost of supporting both i. The cost to do both is probably very large ii. Must consider the costs to convert between the two c. Erosion of single code standard mind-set i. If two, why not three? four? ten? d. Diminishes advantages of either alone without the other i. Single code standard solves many problems that would not be solved if we have two or more of them ii. May introduce the requirement to switch between the two 2. The importance of de-jure standards a. Increasingly used as procurement requirements i. Gives customer more options for interconnection of products from different vendors b. Integral part of vast, interlocking family of standards, each assuming the others c. Better acceptance, because every country can participate i. Not perceived as dominated by U.S. 3. Problems of code conversion a. Must identify both the source and the target code, but often no way to do this b. Conversion is application/subsystem dependent, and it often cannot be confined to one place (that is, it is much more expensive) c. Solving same problem in several places introduces probability of getting some solutions out of synchronization with others d. An uncontrollable, moving target (that is, you never own more than one of the two codes, you cannot control repertoires, etc.) e. Complicated by repertoire differences f. No right way to manage the differences i. Mismatch can range from minor irritation to catastrophe g. Further complicated by differences in character semantics i. No tested solution is known ii. At best, makes translation even more difficult 4. The Costs of code conversion a. Monetary cost of developing, testing, maintaining, etc. b. Diversion of human and other resources by developers c. Performance and memory penalties (extra overhead) d. Errors and other problems are inevitable e. Customer dissatisfaction f. Customer conversion requirements will divert resources for creating local solutions g. Forces tradeoffs between satisfying installed base and meeting new market requirements 5. Other advantages a. One reference source for the code Areas of Consensus 1. Remove the C0and C1 restrictions. We support the ECMA proposal, point 1, To remove the restriction on the so-called C1 space. This point is also included in the Canadian proposal, and other national body positions on DIS 10646 including the ones from China and the US. Vote Thursday: 17 for/ 0 against/ 2 abstain (Davis, Freytag) In addition, pending a careful review by computer communication, systems, and applications experts, from ISO, ECMA, CCITT, and within our enterprises, we believe it desirable to allow encoding graphic characters in the C0 space presently reserved in DIS 10646. This refines point 2 from the Canadian proposal. Annex ____ provides more details on this refinement (the Bohn refinement, named for Willy Bohn, who proposed it) of the ECMA proposal. Vote Thursday: 16 for/ 0 against/ 3 abstain (Bishop, Hasegawa, Sato) Removing the C0 restriction in addition to removing the C1 restriction will provide for flexibility by allowing the encoding of more characters in the base multilingual plane that is the most important 2-octet plane for interchange and interworking. A consequence of removing the C0 restriction is that 10646 must change the way 1-octet control characters are encoded by placing the 1-octet control character into the least significant octet of the current compaction method and padding the most significant octets to the width of the current compaction method. In addition, the 1-octet compaction method must be adjusted to ensure that the control characters are correctly handled. 2. Create an International Repertoire of Unified Chinese, Japanese, and Korean Ideographs and Encode This Set of Ideographs into the Base Multilingual Plane. We propose a refinement to point 5 of the Canadian proposal. We believe that coding an international repertoire of unified Chinese, Japanese, and Korean ideographs in the base multilingual plane is mandatory for international interworking and processing efficiency. The encoding of the international C/J/K repertoire must be completed by the end of 1991. We propose to use the CJK-JRG results if it is available in 1991; otherwise we propose to use the best information available at that time. Vote Thursday: 17 for/ 0 against/ 1 abstain (Ksar), 1 absent (Hasegawa) Recent statements by the Japanese delegates to WG2 showed their strong support for the CJK-JRG. From this information, the group concluded that the unification of Chinese, Japanese, and Korean ideographs so highly desired by the international community is feasible. Providing that WG2 continues to recognize the stated Japanese requirement to encode its characters in its own 10646 plane, Japan recognized the need for an international repertoire of Chinese, Japanese, and Korean ideographs. A meeting of the CJK-JRG has been called (Tokyo, July, 1991) to start creating an international repertoire and ordering. 3. Allow the Option to Use Non-Spacing Marks. Pending careful review by ISO TC46 and CCITT, we propose to refine point iv) 2) of the ECMA proposal for floating diacritical marks as follows: The third Code Extension Level should specify: a. In addition to diacritics, non-spacing marks should include stress marks, tone marks, and those used for text processing operations such as underlining or mathematical notation for the name of a vector. b. Non-spacing marks should follow the base character for consistency. c. Imaging and the order of multiple non-spacing diacritics should follow well-defined rules. (See Annex ____.) d. To allow for compliance with future versions of 10646 that may encode additional pre-composed characters, allow both encoding a character as a pre-composed character or as a base character with one or more non- spacing marks. (That is, delete the ECMA statement if the accented letter is already coded as a single character, the alternative representation by means of floating diacritical marks is not allowed.) This assumes that future revisions of 10646 will take certain characters that used floating marks in the current version of 10646 and encode them as pre- composed characters. e. All sequences of codes should be allowed because of the difficulty of enforcing a legislation against certain sequences of code positions. Vote Thursday: 16 for/ 0 against/ 1 abstain (Sato)/ absent (Bohn, Hasegawa, Ksar) 4. Define the merger (10646M) of DIS 10646 and Unicode as a 4- octet code. Vote Thursday: 16 for/ 0 against/ 0 abstain/ absent (Hasegawa, Ksar, Bohn) We support the 4-octet definition of the merger of DIS 10646 and Unicode. Using 4-octets allows the flexibility needed to expand the code repertoire to meet all foreseeable requirements. 5. Location of Space for Presentation Forms We would support a drastic reduction or elimination of the presentation forms in the base multilingual plane while retaining codes necessary to transcode existing standards in plain text. People were concerned that DIS 10646 reserved too much unused code space in the base multilingual plane. A final determination of the presentation codes will be made in consultation with Arabic and other experts. Vote Thursday: 15 for/ 0 against/ 1 abstain (Becker) 6. Combine the Repertoires of DIS 10646 and Unicode into the Merged Code. We propose that the repertoire of the base multilingual plane of the merged code, 10646M, be derived from a superset composed of the union of the repertoires of DIS 10646 and Unicode; for example, the superset should include pre- composed Latin, Greek, Hangul, Vietnamese, and additional symbols. Vote Friday: 10 for/ 0 against/ 0 abstain 7. Simplify the Compaction Methods. We are concerned about the complexity of the DIS 10646 compaction forms. For simplicity, we propose that there be several parts to the standard: Part 1: General introduction, terminology, etc. Part 2: The base multilingual plane (BMP). This part of the standard will specify the 2-octet implementation of the BMP. Other parts are not required for conforming implementations of the BMP. This part may be implemented without announcers. Part 3: The full four-octet canonical form. Part 4: Mechanisms for other compaction methods to be determined. In the absence of other introducers for 10646 data, Part 2 should be assumed. Vote Friday: 10 for/ 0 against/ 0 abstain 8. Make Annex H Part of the 10646 Conformance statement. We recommend moving Annex H of DIS 10646 into the main body of the standard and making it a requirement for conformance. Vote Friday: 9 for/ 0 against/ 0 abstain/ 1 absent (Bohn) Due to time limitations we were unable to discuss and make recommendations to resolve the following differences between DIS 10646 and Unicode. 9. Coding of Semantics versus Shape. For example, parenthesis, brackets and braces are coded as open/close in Unicode, and as left/right in DIS 10646. 10. Using Any Multi-Octet Coded-Character-Set Will Require Program Changes. The following two examples show that neither DIS 10646 nor Unicode may be blindly used with the C programming language. a. C Language Wide-Character (wchar_t) Model Padding ISO 8859/1 characters with the decimal 032 value precludes the direct use (without conversion) of 10646 compaction forms 2-4 as the wchar_t data type in the C programming language. This is point 3 in the Canadian position statement. b. NULL Characters in the C Language Unicode may use 000 as the first or second octet of the 2- octet code. The C language uses the NULL (000) octet as a character string terminator for 1-octet character data. Therefore, C programs must be rewritten to use Unicode. 11. Other Issues The above list of differences between Unicode and DIS 10646 is not exhaustive. Other lower priority issues also need to be considered. Action Items to Promote the Agreement 1. Participants will lobby for this proposal with their country and company constituencies. (All, immediately) 2. Ask the Unicode Consortium member companies to place a discussion of this document on the agenda of the next Unicode Consortium meeting on June 7. The Unicode Consortium should formally state that it agrees or disagrees with the general direction and state any of its concerns with specific points. (Whistler) 3. Form a joint editing committee to help draft the final 10646 merged standard. (Freytag provides updated code tables, Hasegawa provides updated structure and text, 15 Aug. list the areas of the DIS 10646 document that would require changes) 4. For closer cooperation between ISO and the Unicode Consortium, we encourage the Unicode Consortium to pursue becoming a liaison member of JTC1/SC2, and for JTC1/SC2 to accept the Consortium as a liaison member. (Unicode Consortium, Aug., 1991) 5. Send this report to the national bodies and ask them to consider our consensus agreement in their votes on ISO-IEC DIS 10646. (Hart, 29 May) 6. Provide a list of the advantages of having one multi-octet code rather than two. (Andersen, done) 7. (Point 1) Coordinate an investigation of the impact of coding in C0. (Scheinberg, 15 Aug.) 8. (Point 2) Using formal minutes and other information, summarize the Tokyo CJK-JRG meeting. (Collins, 31 July) 9. (Point 3) Provide the Annex describing the rules to be used with multiple non-spacing marks. (Whistler, 9 June) 10. (Point 3) Coordinate review by ISO TC46 and CCITT of proposed use of non-spacing marks. (Smith-Yoshimura (TC46) and Friemelt (CCITT), Aug. 15) 11. (Point 5) Coordinate a review of the need to reserve so large an area for presentation forms for Arabic and other scripts on the base multilingual plane. (Ksar and Friemelt, 15 Aug.). 12. (Point 6) Investigate need for composed characters from Cyrillic and Polytonic Greek. (Why did WG2 include them in the DIS?) (Whistler, 15 Aug.) 13. (Point 7) Coordinate an investigation of which compaction methods to propose in Part 4. (Jarnefors, 15 Aug.) 14. Create 10646M electronic distribution list. Send electronic mail message to Hart to subscribe. (Hart, done) (End of Document) Ú Edwin Hart Joan Winters 05/31/91 Draft Proposal to Merge DIS 1 ======================================================================== Date: Fri, 31 May 91 09:50:38 EDT From: Edwin Hart Subject: Re: 10646M Minutes --Notes To: Ken Whistler , "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> In-Reply-To: Your message of Tue, 28 May 91 15:36:56 PDT Ken, thanks for your comments. Here is how I handled them. >Here are my more picky notes on the draft, followed by a couple >of more hefty substantive comments on two of the points regarding >Areas of Consensus. > >I concur with Olle's comments re wording of point 6. C0-C1 restriction >and 7. Non-spacing marks. > >Concur with comments circulating re removing the schedule of the next >ad hoc meeting in Geneva from paragraph 5 of the Summary. > I did these and fixed the typos. Unfortunately, I did not fix Van Stee's name--He also sent me a note after I had the "final" reproduced. I have corrected it for the final August version > >===================== > >Substantive fixes: > >Areas of Consensus 4., 2nd paragraph: > I changed the wording to say it would be a 4-byte code and removed the part about the Canadian proposal. I believe (and I could be wrong) that the intent was that 10646M would be a 4-byte code rather than to specify coding of the BMP. So far we have concentrated on the architecture and purposefully deferred discussing code assignments, including the number of the plane for the BMP. These decisions need to wait on the report on C0 coding. > >This is to be constrasted with the current DIS 10646, where the >three values would come out to: >Unicode U+0041 ==> decimal 65 >10646 032/065 ==> decimal 8257 >10646 032/032/032/065 ==> decimal 538,976,321 > >Incidentally, this numerical values problem is not just numerology. >Making the ASCII character value = the 16-bit character value >= the 32-bit canonical form character value is a MAJOR help >to conversion of existing software, and to my mind is the >strongest argument by far for agreeing to abandon the C0 restriction. >The second strongest has to do with value contiguity, range-checks, >and table-size. Only the third level of the argument has to >do with the overall coding space-size--and even that one is important! > You have made a good point here for removing the C0 restriction. Your examples show why you and several other people from Unicode (Joe Becker included) have be so concerned with multiple representations of the same character in DIS 10646. The programmer must be much more careful in handling the compaction methods. I would only use the compaction methods for storage and transmission. For processing, I would first "normalize" 10646 into either a 2-byte or 4-byte form. > >Areas of Consensus 11, NULL Characters in the C language > > >What I am getting at is that all C code designed for 8-bit character >interfaces has to be rewritten to handle multiple-octet codes. Unicode >or DIS 10646 both have this problem. All multiple-byte character >encodings have this problem. And ALL computer languages have this >problem (Assembly, Cobol, Fortran, Forth, Pascal, Modula, C++, APL, >Lisp, Snobol, SmallTalk, Eiffel, Icon, ...) --they are ALL broken if >interfaces to handle strings in 8-bit units get handed strings with >characters encoded in 16-bit or 32-bit units. They ALL need to be >fixed. > I tried to capture the above thought when I edited the final draft. Isai said that I still failed to capture the right ideas. However, we have not discussed it but can discuss it at our next meeting so that I can get it right. This was only the third mistake I made yesterday. I wonder where I'll get into trouble next? I expanded that we need to have a statement from the Unicode Consortium saying that they approve or disapprove of the general direction of this 10646M group. (This is for the June 7 meeting.) Also for Cyrillic pre-composed characters and Platotonic Greek, you need to review the SC2 and WG2 documents to find why they were included in the DIS and any other specific comments. I think the group needs to understand why they were included in the first place, what the arguments are for including them and the arguments for removing them. For the proposed merger, we need to decide what to recommend. Thanks again for your comments. Ed Ú Edwin Hart Ken Whistler 05/31/91*10646M Minutes --Notes ======================================================================== Date: Fri, 31 May 91 10:23:07 EDT From: Edwin Hart Subject: Re: Current Draft of Ad Hoc Meeting To: Olle Jarnefors In-Reply-To: Your message of Tue, 28 May 91 18:47:17 +0200 Action item (old) 14. Thank you for agreeing to this assignment. Use input from any sources you feel are appropriate including the ISO10646, Unicode, and 10646M LISTSERVes. Best regards, Ed Ú Edwin Hart Olle Jarnefors 05/31/91*Current Draft of Ad Hoc Meeti ======================================================================== Date: Fri, 31 May 91 11:18:54 EDT From: Edwin Hart Subject: Guess who did not have his name on the distribution list To: "Issues with Merging 10646 and Unicode (10646M)" <10646M@JHUVM> Until a few moments ago, I did not have my name on the 10646M list. Color my face red. I just obtained a copy of the log to see all of the messages. As the owner, I get all of the mail delivery error messages but none of the other data traffic. Lloyd, since I just saw your note now, I did not put any of it into the final document. I must also appologize because in my haste I forwarded some personal mail to the distribution. As one trying to encourage trust between the ISO and Unicode people, I sure have made a big mess of it. I am sorry. Since we have the list available, send mail to it for distribution. I will not redistribute mail sent directly to me unless you direct me to redistribute it. Here is a list of the people now on the electronic distribution list: * * 10646M: Multibyte code working group * * Confidential= Yes * Files= No * Mail-via= Dist2 * Notebook= Yes,X1/201,MOnthly,Public * Owner= HART@APLVM * * * 10646M mailing list * * Location: JHUVM * * Purpose: * * For discussion of merging ISO DIS 10646 and Unicode into one * global multibyte code. * ojarnef@ADMIN.KTH.SE Olle Jarnefors HART@APLVM Edwin Hart jenkinsj@APPLE.COM John Jenkins Davis.Mark@APPLELINK.APPLE.COM Mark Davis ecoling@APPLELINK.APPLE.COM Lloyd Anderson Bishop@DECWET.ENET.DEC.COM F. Avery Bishop ksar@HPCEA.CE.HP.COM Mike Ksar Takayuki_K_Sato%e2@HP8900.DESK.HP.COM Takayuki K Sato ma_hasegawa@JRDV04.DEC.COM Masami Hasegawa whistler@METAPHOR.COM Ken Whistler andersen@RALVMK.VNET.IBM.COM Jerry Andersen bl.kss@RLG.STANFORD.EDU Karen Smith-Yoshimura jvanstee@STLVM7.VNET.IBM.COM J. G. Van Stee schein@TOROLAB5.VNET.IBM.COM Isai Scheinberg microsoft!asmusf@UUNET.UU.NET Asmus Freytag microsoft!michelsu@UUNET.UU.NET Michel Suignard becker.osbu_north@XEROX.COM Joe Becker * * Total number of users subscribed to the list: 17 * Total number of local node users on the list: 0 * Ú Edwin Hart Issues with Merging 05/31/91 Guess who did not have his na