|
||||
This page contains notes recovered from old email archives, pertinent to decisions taken at early Unicode Working Group meetings from 1989 through 1990. These are not official records or meeting minutes, but represent commentary by contemporary participants in those meetings which sheds light on the kinds of decisions that were taken and how people understood them at the time. In some cases proposed addition to the agendas for immediately upcoming meetings are also listed, on the presumption that those topics would have been discussed at those meetings.
Some of the entries are simply attributed short quotes snipped from the email archives. Other entries are more extended quotes of entire email entries, including headers. These lengthy citations often provide more context about decisions taken, and explain more about the thinking at the time as the drafts were developed. Some of the extended quotes have sections of text elided. Such elisions are indicated with "[...]".
From Joseph_D._Becker.OSBU_North@xerox.com Wed Jan 9 19:35:27 1991 Date: Wed, 9 Jan 1991 11:36:46 PST From: becker.osbu_north@xerox.com Subject: Unicode Conformance Clause Proposal To: u-core%noddy@Sun.COM Cc: becker.osbu_north@xerox.com Mark wrote a draft on Unicode Conformance a while back, but it seemed to me fairly discursive, I was imagining that we might want a concise normative statement. So, below is a draft of such a conformance clause, for discussion at this Friday's meeting. ---------------------------------------------------------------- CONFORMANCE 1. Interchange Interchange refers to processes which transmit and receive (including store and retrieve) sequences of text characters. The conformance requirement for a Unicode system with regard to Interchange is: > Except insofar as a system makes intentional changes in a character sequence consistent with the Unicode definition of the character semantics (e.g. case forcing, filtering, Compatibility Zone mapping), a conforming system must be able to retransmit the text as the same sequence of numerical code values that was received. 2. Presentation (aka Rendering) Presentation refers to processes having access to fonts and other resources, which take in sequences of text characters and produce a visible graphic depiction of the text. The conformance requirement for a Unicode system with regard to Presentation is: > For any given character sequence that a conforming system is able to render legibly, the graphic depiction must have a reading consistent with the Unicode definition of the character semantics. 3. Interpretation Interpretation refers to processes which take in sequences of text characters and produce results based on the content of the text (e.g. spell-checking, indexing, transliteration). The conformance requirement for a Unicode system with regard to Interpretation is: > For any given character sequence that a conforming system is able to interpret, the interpretation must be consistent with the Unicode definition of the character semantics. ---------------------------------------------------------------- Examples: A conforming system may receive any text sequence and retransmit it unchanged. Whether or not it could have performed any other process on the text (e.g. display it, spell-check it, etc.) is immaterial. A conforming system may receive a sequence of English text and retransmit it all converted to uppercase (presumably an intentional change consistent with the text's semantics). A conforming system may NOT receive a sequence of English text and retransmit it all converted to random Bengali characters, or vice versa (presumably an unintentional change inconsistent with the text's semantics). A conforming system may NOT take in a sequence of Unicode characters and treat it (i.e. present or interpret it) as though it were a sequence of ASCII bytes. A conforming system may be unable to render a given character or sequence legibly (e.g. a system with only Latin font resources given a sequence of Bengali characters). A conforming system may render a given sequence of English text in any Latin font style, line length, page layout, etc. of its choice, so long as the text is conventionally readable with the intended interpretation. Discussion: This design is predicated on the assumption that with a fully multilingual character set in use worldwide in various editions on all manner of equipment, it is impossible to preclude by legislation the common occurrence that a system may receive an unfamiliar character code, i.e. one that it is unable to present or interpret. Rather than trying to legislate away the commonplace as ISO does, we merely make one extremely simple provision for handling such cases: insisting on the ability to retransmit even unfamiliar codes unchanged. We thus define an entirely workable scheme that avoids the futility of the ISO conformance clauses. Joe
From whistler Mon Mar 11 16:41:47 1991 To: Becker.OSBU_North@Xerox.Com Subject: Size & Shape of Unicode (Minutes of 12/14/90 WG Meeting) Cc: unicore@Sun.Com, whistler@zarasun O.k., Unicampers, I will pay the piper for not having written up the minutes of the meeting of December 14, 1990. Here are the Minutes [mainjuts] from the December 14 meeting regarding the question Joe raised. [Just the relevant section.] Corporate Zone There was a general discussion of the pros and cons of defining an explicit corporate zone. The four possible types of characters were classified as: Normal characters Compatibility Zone characters Corporate Zone characters (in question now) User characters The problem was pointed out that there is not feasible way to define a vendor. This essentially makes it impossible to draw a clear distinction between general user characters and corporate use characters. Asmus proposed a compromise way to separating the two. Let general user characters "grow" in one direction in user space, and corporate use character in the other. This would not formally distinguish them, but would allow companies to separate them practically. There was some discussion of how big the corporate use areas should be, and the group converged on a proposal to expand the user area by another 2K to accomodate the largest sets. Result: User space will be 6K - the Compatibility Zone at the top of Unicode; i.e. there will be a 5-1/2 K user space. New user space definition: E800 - FDFF Compatibility zone: FE00 - FFEF Conventions of assignment: Corporate use characters start at FDFF and grow down. General user characters start at E800 and grow up. ==================================== Latefully submitted. --Ken Whistler, Unicode Secretary.
At the 12/14/90 Unicode meeting we decided to try to set a date for the first Unicode Inc. Directors meeting.
—cited from email: Kernaghan to Unicore, December 18, 1990
From decwrl!metaphor!HQ.M4.metaphor.com!kernaghan Wed Dec 5 03:47:06 1990 Date: Tue, 4 Dec 90 14:22:30 PST To: u-core@noddy.Eng.Sun.COM Subject: Licensor Aggreement for review by 12/14/90. This is a draft of a License Aggreement between Unicode and a Licensor to Unicode (a similiar one is forthcoming between Unicode and the Licensee). Please review it and provide any feedback to me on-or-before the Unicode meeting of 12/14/90. FYI - This draft was developed between Mark Davis, Mike Kernaghan, and a Metaphor lawyer as an action item Mike took from the last By-Laws meeting on 11/15/90. The plan is to try to have the License Aggreements ready by the time we incorporate (although it is not required). Although most of this id fairly standard wording, we do want to make sure it conveyes the Unicode spirit of "spreading the standard far and wide". Therefore, please sent me your comments for inclusion, and we will try to come to consensus on this agreement at the 12/14 meeting. Mike Kernaghan - Metaphor
At the November 30, 1990 meeting, in reviewing what needed to be added to the charts for the Draft Standard Final Review Document, Becker proposed to make U+FFF0 - FFFE, as a group "Special" characters, i.e. graphic character codes not in the Compatibility Zone. This was agreed to.
—cited from email: Whistler to Unicore, March 11, 1991
All my time until Friday is going to be devoted to the database update to get the cross-mappings back in sync with Microsoft. I'll bring hard copy of that (and soft copy) to Friday's meeting.
—cited from email: Whistler to Caldwell, November 28, 1990
At the November 16, 1990 meeting, it was agreed to add the REPLACEMENT CHARACTER at U+FFFE.
—cited from email: Whistler to Unicore, March 11, 1991
From whistler Mon Nov 19 13:14:10 1990 Date: Mon, 19 Nov 90 13:14:04 PST From: whistler (Ken Whistler) To: fdc@watsun.cc.columbia.edu Subject: Re: line & paragraph separators No, I wasn't talking about the Bremmer & Kroese "Waka waka bang splat", which I did enjoy, too. When we were discussing the proper name for apostrophe awhile back, I distributed the following to a number of people--though not to the entire unicode mailing list: We set out to fix the apostrophes, And avoid any coding catastrophes-- We'll take a new vote: 'APOSTROPHE-QUOTE!' Requiescat in pacem these daft'strophes. In any case, the mailer in question destroyed line breaks in both! The Unicode meeting last Friday (16th) decided to add a LINE SEPARATOR and a PARAGRAPH SEPARATOR as distinct, unambiguous characters--basically my Proposal A from the earlier discussion on this. We also talked about the guidelines for conversion when converting ASCII-based code to Unicode. We agreed that it would be quite useful to have a standard enumeration of how to deal with common formats for lines and paragraphs (Unix, PC, Mac, ...). The first order conversion for control codes is simply to sign extend then to 16 bits (for both C0 and C1). That is really all that a Unicode conformant "device" should have to do. But for lines and paragraphs, there are a number of specific interpretations of various sequences of CARRIAGE RETURN, LINE FEED, FORM FEED, etc. which a Unicode application could convert to unambiguous codes, if it so desired. Otherwise Unicode's intention is to leave the C0/C1 codes uninterpreted. They mean whatever an application intends them to mean. If some application wants to use a whole raft of specialized C1 codes from whichever ISO standard, it could, with the proviso that in Unicode text, the C1 codes are 16-bit sign extended (U+0080 .. U+009F) to conform with the 16-bit architecture of Unicode. (In earlier drafts of Unicode, we had omitted the C1 space, but sometime last spring it seemed advisable to vacate the C1 space and just let the semantics of those 32 positions be specified by the pertinent standards.) On the other hand, I don't think that anyone intends that Unicode will be implemented (except in marginal ways) with character-oriented devices, which is part of the reason why Unicode is nearly silent about control codes. "control sequences" are simply in another space, as far as Unicode is concerned, and text is not modeled as something which "controls" a device. Instead, a text store is acted upon by a rendering algorithm which maps it to a rendering device (typically a screen raster or a printer raster). The controlling language for the device itself (e.g. Display PostScript) has no direct relation to Unicode. I'm sure that the fact that Unicode is not an 8-bit standard (unlike 10646) will hinder its acceptance on DEC terminals. But the first implementations will all be in bitmapped graphics workstation/PC platforms, and the implementors don't much care about controlling terminals. The considered opinion seems to be that the control codes in text approach, however expanded, simply can't be scaled up to deal with the generic problems of multilingual software. The architecture is just not right for effective computerization of really multilingual software. --Ken Whistler
From Joseph_D._Becker.OSBU_North@xerox.com Tue Nov 13 09:39:27 1990 Subject: 10,532 More Hangul Syllables in Forthcoming Korean Standards To: u-core@noddy.Eng.Sun.COM I'm not sure how public the following was intended to be, so it's probably best not to forward it, but in case you hadn't received this information I think we should add it to the agenda for the 11/16 meeting. Joe ---------------------------------------------------------------- Date: 5 Nov 90 07:11:23 PST (Monday) Subject: Progress of KS Expansion Project From: ksri%halla.dacom.co.kr%uunet.uucp%autodesk.uucp%sun.Eng.Sun.COM@SGI:COM To: Becker:OSBU North Hi, This is Jeoung Sung Cho. I have received all messages distributed by UNICODE peoples. Thanks for allowing me to get those valuable information. I am sending this message to inform you the progress of KS Expansion Project for Hangul and Hanja code. The committee decided to define two supplementary sets. The first set we call Hangul Supplementary Set will include 8832 Hangul characters. This set and 2350 Hangul charcters of KS C 5601 will cover the entire Hangul characters that are used currently. The second set we call Old Hangul and Hanja Supplementary Set will include 1700 old Hangul and approximately 5000 Hanja charcters. The committee is reviewing the second set now. The new sets will be announced as KS by middle of 1991. I wonder if you can assign 8832 Hangul characters in UNICOE. I know that it is very difficult to assign additional characters in version 1.0. If you can't assign the additional Hangul characters in this version, is there any possibility to assign those characters in next version. The reason why we define the additional Hangul characters is that there is very strong requirement from the korean user that KS should include all Hangul characters that are used currently. If you need any additional information about the progress of the project let me know. Your prompt response on this message will be very appreciated. Sincerely yours, Jeoung Sung Cho KSRI ----------------------------------------------------------------
From whistler Tue Nov 13 14:15:28 1990 Date: Tue, 13 Nov 90 14:15:24 PST From: whistler (Ken Whistler) To: glenn@ila.com Subject: Re: Unicode Consortium It doesn't officially exist yet. That's what the Thursday meeting is about--to sort out any remaining legal problems in getting the thing incorporated. So far the Unicode "consortium" consists of those companies who show up at the regular RLG meetings. But once the Consortium exists, its formal members will be those companies which officially join and pay annual dues--like almost any consortium. We should have more details available this Friday, after the Thursday meeting sorts out whatever needs to be resolved. --Ken
Bidi Algorithm
We'll have a subcommitee meeting on the 11th at noon in Redmond.
—cited from email: Freytag to Unicore, October 7, 1990
I am currently maintaining the database which contains the non-CJK part of the Unicode draft. At each Unicode meeting I bring updates of the database in the form of various TAB-delimited text files representing reports from the database. One of these is the master Unicode names list and another is the alphabetical listing of character names.
—cited from email: Whistler to mrfung, October 11, 1990
Michel's "topics": Looking at my notes I found: We raised 2 at the last meeting: - Statement on migration strateg(ies). What are you supposed to do with the Middle High Norse charcters that aren't in unicode - How to map from old standards: "Unicode allows to maintain intent" when there is no 1:1 mapping.
—cited from email: Freytag to Unicore, October 7, 1990
On September 14, 1990, at roughly 2 p.m., the Compatibility Zone was added with considerable reluctance, originally defined to be 511 codes: U+FE00 - FFFE.
—cited from email: Whistler to Unicore, March 11, 1991
Michel's "topics": Looking at my notes I found: ... From the previous meeting we had - Floating Diacritic handling
—cited from email: Freytag to Unicore, October 7, 1990
At the 9/14 meeting, we discussed producing a "Answers to the Top 10 Most Asked Questions" section, and I said I would distribute my old draft. It is below. I haven't re-read it since it was written 1.5 years ago, so undoubtedly it is out of date. Perhaps someone can use it as the basis for producing a modern version of such a summary.
—cited from email: Becker to Unicore, September 15, 1990
From daemon@Metaphor.COM Wed Oct 3 10:06:49 1990 Received: from YKTVMV by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with BSMTP id 6536; Wed, 03 Oct 90 08:27:06 EDT Date: 03 Oct 1990 08:25:19 EDT From: dan%ibm.com@CUNYVM.CUNY.EDU (Walt Daniels) To: unicode@Sun.COM Subject: order of floating diacritics > As recently as the August 17, 1990, Unicode >consortium meeting, this very topic was discussed and the policy that the >order of multiple diacritics would not be specified was reaffirmed. > J VanStee - private mail I do not remember seeing this on the mailing list but I do remember a long discusion about the order being inside-out, top-bottom, etc. Were these just suggestions or will they be part of the standard?
From Joseph_D._Becker.osbunorth@Xerox.COM Mon Jul 30 17:38:32 1990 Subject: Re: Coding of accented characters To: ma_hasegawa@jrdv04.enet.dec.com Cc: Unicode@Sun.COM, Becker.osbunorth@Xerox.COM In-Reply-To: "ma_hasegawa"%jrdv04.enet.dec.com%Xerox:COM's message of 22 Jul 90 23:19:12 PDT (Sunday) Masami, Thanks to your call for clarification, the Unicode meeting of 7/27 re-addressed the question of the ordering of multiple diacritical marks ... especially since I had mis-documented the group's previous decision (for which I had failed to find notes). Here is the correct statement: ---------------------------------------------------------------- ... Sequence order of multiple diacritcal marks: In case of multiple diacritcal marks applied to the same base character, if the result is unambiguous there is no reason to specify a sequence order for the mark characters. In particular, marks may fall into four categories: above the baseform, below the baseform, superimposed on the body of the baseform, and surrounding the baseform. Between two marks that are in different categories, there is never an ambiguity, hence never a need to specify sequence order. In the relatively rare cases where an unambiguous sequence order of multiple marks of the same category is necessary, that order should be: FROM THE BASELINE OUTWARD. ... ---------------------------------------------------------------- We had a fairly careful discussion of which was more beneficial: (A) to specify a canonical order in all cases (B) to leave the order flexibile where possible We decided that it really came down to a question of WHEN it was more efficient to filter the character sequence into canonical order: (A) when the sequence is created (e.g. by input or editing), or (B) when the sequence is interpreted (e.g. by a comparison routine). It seemed clear to all of us that (B) is more effective, since it is nearly impossible to control all means of assembling character sequences, and the final interpretation is best left to the end-user routine anyhow. So, the correct semantics would be: a) LATIN CAPITAL LETTER A + NON-SPACING MACRON + NON-SPACING DIAERESIS (3 characters) [indicates the diaeresis above the macron] b) LATIN CAPITAL LETTER A + NON-SPACING DIAERESIS + NON-SPACING MACRON (3 characters) [indicates the macron above the diaeresis] Meanwhile, in response to my message of 23 Jul 90 17:23:57 PDT concerning 10646 support for Rhade, etc., you replied 23 Jul 90 21:33:10 PDT: >> As for missing characters for 10646, we have been allocating additional characters based on request with justifications. If you think some characters are missing, you should submit the request through the established process like everyone else (through ANSI) to ISO. I have been trying since that time to get you to confirm my understanding of this reply. My understanding is that 10646 is unable to represent > extensions of the Latin script > marked symbols for mathematics & physics > the International Phonetic Alphabet > pointed Arabic & Hebrew > Hindi & Sanskrit (and by extension all South and Southeast Asian scripts) UNLESS each and every possible combination of base characters and marks is submitted for registration through ISO. Is the above statement correct or not? Meanwhile, in your message of 29 Jul 90 18:39:40 PDT, you said: >> ISO 10646 can be used with a control code standard (ISO 6429). In ISO 6429, there is a control function GCC (Graphic Character Composition). So for THOSE APPLICATIONS which need to combine graphic symbols, there is a way. Now I am extremely curious. Does 10646 plain text need to be encoded differently for some applications than for others? Have you never had the experience of transferring text between two systems or applications that were not designed to expect such a transfer? It seems to me that such "blind" transfer is a normal everyday part of text interchange, especially in systems integrated from multiple-vendor components. I do not understand whether the GCC control code is permitted, or optional, or mandatory, for each of the cases: > extensions of the Latin script > marked symbols for mathematics & physics > the International Phonetic Alphabet > pointed Arabic & Hebrew > Hindi & Sanskrit and other South / Southeast Asian scripts Please make that clear to us in each case. Finally, it would be valuable to return with this new knowledge to your original question: >> What I want to know is the "correct" representation of, say LATIN CAPITAL LETTER A WITH MACRON AND DIAERESIS, a character needed for Lappish. Now we are asking this question with regard to ISO 10646, assuming that the Lappish character is not already registered (I don't think it is in 2nd DP), and (I guess) using the GCC control code. Possible representations are: a) LATIN CAPITAL LETTER A + GCC + NON-SPACING MACRON + GCC + NON-SPACING DIAERESIS (5 characters) b) LATIN CAPITAL LETTER A + GCC + NON-SPACING DIAERESIS + GCC + NON-SPACING MACRON (5 characters) c) LATIN CAPITAL LETTER A WITH DIAERESIS + GCC + NON-SPACING MACRON (3 characters) d) LATIN CAPITAL LETTER A WITH MACRON + GCC + NON-SPACING DIAERESIS (3 characters) f) NON-SPACING MACRON + GCC + NON-SPACING DIAERESIS + GCC + LATIN CAPITAL LETTER A (5 characters) g) NON-SPACING DIARESIS + GCC + NON-SPACING MACRON + GCC + LATIN CAPITAL LETTER A (5 characters) h) NON-SPACING MACRON + GCC + LATIN CAPITAL LETTER A WITH DIAERESIS (3 characters) i) NON-SPACING DIAERESIS + GCC + LATIN CAPITAL LETTER A WITH MACRON (3 characters) Which of the above representations are valid? (Also please indicate which ones specify whether the macron is above or below the diaeresis). Thanks, Joe
From BOSURGI1@AppleLink.Apple.COM Thu Aug 2 18:09:22 1990 Cc: AUTH1@AppleLink.Apple.COM (Auth, Michael,CLA), RICHARDSON7@AppleLink.Apple.COM (Richardson, Lee,CLA) Subject: Precomposed, compatible, etc. To: U-CORE@NODDY.Eng.Sun.COM From: BOSURGI1@AppleLink.Apple.COM (Bosurgi, Joe,CLA) Date: 03 Aug 90 00:28 GMT > I agree with both Glenn and Lee and this subject. All our mappings between > existing PC standards and Unicode have been using precomposed characters. It > is not even obvious for me that the first system implementations of Unicode > should have full floating diacritic support, including: > - collating, > - rendering, etc... > The precomposed character set included in Unicode will cover 99% of our > current need and it will be difficult to justify the large investment > required by the full support RIGHT NOW. I understand that we have to add the > full support later, but this looks like a medium term goal for me. > Michel Suignard, Microsoft We've been having some discussions here on the European pre-composed/full floating diacritic issue, and have been leaning in the same direction that Michel has so eloquently expressed above. This in no way indicates that we want to do away with floating diacritics or anything like that. But it seems a mistake to take the floating diacritic version of a character to be the standard, or even "preferred" way of representation, while denigrating the pre-composed form. I think that many developers will want to transition to Unicode WITHOUT implementing a generalized look-ahead function for handling floating diacritics "in the first release". As Michel notes, this will "cover 99% of the current need". I _do_ think it is inevitable to move to the full floating diacritics and a generalized look-ahead implementation at some point, but I'm not sure what we'd gain (except resistance) by requiring this initially. Not only does this impinge on pre-composed European characters, but also on IBM's suggestion of re-opening the "compatibility zone" at our last meeting. Specifically, the only mechanism available for getting isolated Arabic letters, zenkaku Roman characters, and hankaku katakana at present involves zero-width non-joining characters, and the hankaku/zenkaku "diacritics". The same argument might be applied here. Of course, Arabic and Indic scripts will always involve look-ahead for rendering - but there are probably some developers out there that are only interested in Europe, the Americas, and East Asia "for now". Despite not wanting to "cover the earth" all at once, they can further the quick acceptance of Unicode. And for them, this will "cover 99% of the current need" for these characters at a lower initial cost. I'd still support the idea of the zero-width non-joiner in a "compatibility zone" scenario. The zenkaku/hankaku "diacritics" could probably be tossed. We can work with what is currently defined, but feel that IBM's suggestion has merit. In terms of the "denigration" issue for European characters, we agree with Michel. Joe Bosurgi Claris Corporation
I, too, am engaged in coding case relations and character properties. It would be nice if we agreed about most of these things, but I don't think there is any plan to have Unicode 1.0 also publish a full list of character properties. The names list (Joe Becker has completed a draft) will be invaluable for character identification, and it also contains information about case pairs. Joe should have a corrected draft of that available at the May 18 meeting.
The Apple FileMaker database has some character properties (direction and major class [letter vs. symbol vs. numeric vs. punctuation] coded, but it is only partially up-to-date. Perhaps we should bring up as an agenda item for the May 18 meeting the coordination of efforts to systematically agree upon and develop lists of at least those character properties.
—cited from email: Whistler to Freytag, May 10, 1990
From BL.KSS%RLG@Forsythe.Stanford.EDU Wed May 2 12:04:40 1990 Date: Wed, 2 May 90 11:56:54 PDT To: u-core@noddy.Eng.Sun.COM Subject: Mtg on May 4? (Inquiring Minds Want to Know) Re: What Character Names to List U-Core Folks -- 1. PLEASE let me know 1) if there IS a 5/4 mtg; 2) if it will be here at RLG; 3) what times it will be. (If we're not meeting at RLG, I'm obliged to release the room) 2. Although I have no opinion/advice on the Hamiltonian vs Nabla topic, I do have an opinion on the general question of what names you list for a given character: AT MINIMUM: reference any other name used in a character set standard. (On a case-by-case basis you may consider including names "well-known" but that are not documented in a standard; if the character does not appear in any standard you may have no choice but to use the "well-known" name. But if the character appears in more than one standard under different names, then I would argue for cross-referencing the names in those standards.) Karen
I vote to wait until someone willing to implement Unicode shows us that they need really the Mosaics before we put them back. The lesson here is that people interested in the content of Unicode must make an effort to attend the meetings,and if that is not possible to at least read the mail and minutes. We did vote and no one voted in favor.
[[This refers to a decision taken during the April 20, 1990 meeting to remove the Videotex Mosaics from the Unicode draft. That decision engendered an extended discussion on the Unicore list on May 22, 1990.]]
—cited from email: Collins to Unicore, May 22, 1990
From Joseph_D._Becker.osbunorth@Xerox.COM Thu Apr 26 10:35:09 1990 Sender: "Joseph_D._Becker.osbunorth"@Xerox.COM Date: 26 Apr 90 09:22:34 PDT (Thursday) Subject: Send 'em in From: Becker.osbunorth@Xerox.COM To: lcollins@apple.COM, BOSURGI1@applelink.apple.COM, BL.KSS%RLG@Forsythe.Stanford.EDU, microsoft!michelsu@Sun.COM, glennw@Sun.COM, zarasun!whistler@metaphor.com, James_Higa@NeXT.COM, BR.JMA%RLG@Forsythe.Stanford.EDU Cc: Becker.osbunorth@Xerox.COM ... those alphabet names lists and section introductions that folks said they'd write at the 4/20 meeting. Let's aim for early next week. I have written drafts for Diacritics, Greek, Cyrillic, Georgian, Armenian, Arabic, and Ethiopian, and it'd be nice to try to have a whole package for the May 4 meeting. (Is anyone calling the May 4 meeting?) Joe
From Joseph_D._Becker.osbunorth@Xerox.COM Mon Apr 16 10:23:19 1990 Date: 15 Apr 90 16:24:55 PDT (Sunday) Subject: Re: Miscellaneous characters From: Becker.osbunorth@Xerox.COM To: microsoft!michelsu@Sun.COM, microsoft!michelsu@Sun.COM Cc: lcollins@apple.COM, zarasun!whistler@metaphor.com, Becker.osbunorth@Xerox.COM In-Reply-To: microsoft!michelsu%Sun:COM's message of 10 Apr 90 18:59:12 PDT (Tuesday) Hello again, I agree with all comments, including: SM720000 = 0x21b5 (bent arrow / Enter (Return) symbol) JX710000 = 0x309b (daku-on) JX720000 = 0x309c (han-daku-on) JQ740000 = 0x00b7 (middle dot) ... and the fact that SP500000 was among the list of IBM symbols that I took back out because I was not sure which ones were useful. It sounds like we need to add SP500000 back into Unicode. We can confirm this and a few other additions at the meeting next week. Joe
We are getting here a significant pressure to have more dingbats into Unicode. Especially did you already look at the ITC ZAPF DINGBATS series 100, 200 and 300? Did you develop a standard Unicode position about how to transport them in a Unicode string? I would like that matter to be discussed in the next Unicode meeting. (I am personnally fairly reluctant to add random dingbats as it is not clear then that you wouldn't have to add zillions of them used in European or Asian publications, but again if these ZAPF DINGBATS are widely used we need at least an explicit position about their support)
—cited from email: Suignard to Collins, April 16, 1990
Would like to put onto the agenda for tomorrow's (3/23) agenda for 2:00 pm a discussion of whether there is a need for right-to-left punctuation in Unicode. (We need to give a specific starting time for this topic, because there are RLG staff who will be joining us specifically for this discussion.)
—cited from email: Smith-Yoshimura to Unicode, March 22, 1990
Something you may also need are the mapping tables between Codepages and our UGL, however this represent a lot of information to be sent by mail. If you have some specific ones I can do it or we may as well wait for our next Unicode meeting (March 23rd) when I can bring soft and hard copies.
—cited from email: Suignard to Becker, March 8, 1990
From Joseph_D._Becker.osbunorth@Xerox.COM Mon Mar 19 21:52:43 1990 Date: 19 Mar 90 21:46:06 PST (Monday) Subject: Unicode Architecture Proto-Decisions From: Becker.osbunorth@Xerox.COM To: Unicode@Sun.COM Cc: Becker.osbunorth@Xerox.COM An impromptu micro-Unicode meeting was held today to discuss two smallish Unicode architectural proposals that we thought we could (and did) get agreement on. We present them here in the hopes that the group will rubber-stamp them (yeah, sure) on Friday. Although, if the following folks can agree on something, it must be pretty agreeable (-: Whistler, Morse, Kernaghan, Collins, Bosurgi, Becker. > Proposal: C1 pullout: Designate the 32 "C1" cells in the range 0080-009F as "Control" (interpretation unspecified); distribute 30 of the characters now in this range to appropriabe blocks of punctuation, math operators, etc.; zap the duplicated script-f and pi Con: We had already decided against this (see below); however, the situation has somewhat changed. ------------------------------------------------------- > C1 pullout: leave unassigned the "C1" range 0080-009F Decision: Status Quo, i.e. leave the 32 miscellaneous letters and stuff there. Reason: We might have been willing to concede if we thought that C1 had any widely-accepted standard semantics (as C0 does), but since we don't, there's no point in just leaving the space open. ------------------------------------------------------- Pro: This will make the first 256 Unicodes PRECISELY identical to Latin1, thereby making Unicode more acceptable to many people. We have learned that ISO is indeed playing control code games in the C1 space; we have no pressing reason to prevent them from doing so. For example, some ISO DIS 10538 & DP 10646 C1 control functions: ------------------------------------------------------- * indicates DP 10646 C1 Set 80* PAD PAD OCTET 81* HOP HIGH OCTET PRESET 82 BPH BREAK PERMITTED HERE 83 NBH NO BREAK HERE 85 NEL NEXT LINE 8B PLD PARTIAL LINE FORWARD 8C PLU PARTIAL LINE BACKWARD 99* SGCI SINGLE GRAPHIC CHARACTER INTRODUCER 9B CSI CONTROL SEQUENCE INTRODUCER ------------------------------------------------------- The main people who had wanted the set of 32 characters that we put in these cells were Apple, but they now feel that there would be greater benefit to them in making Unicode more acceptable. If we make this change now, and regret it in half a year, very little damage would be done; but if we DON'T make it now, and regret it in half a year, we'd be badly stuck. > Proposal: Move User Space: Move the User range 3000-3FFF to F000-FFFE; move up CJK Aux to 3000-3FFF, and move up Han to start at 4000. Con: None except for some hassle to make the change. Pro: Not much gain either, but might give us greater flexibility if we need to tweedle User Space in the future (for example, we're told the max IBM user space is 6K, and we would be able to expand ours from 4K to 6K if we felt like it). This is different from the old "Compatibility" proposal, although it does permit us to refer to User Space as F-Space ... Joe
I understand from an informal meeting earlier this week that there will be separate code points for the KS "phonetically-distinct" hanja, to guarantee a one-to-one mapping to KS for text marked as Korean. If so, we like it. What were the statistics on how many there were ? Maybe someone can present this at Friday's meeting.
—cited from email: Bosurgi (Claris) to Unicode, March 21, 1990
As a follow up to the last Unicode meeting I took responsability to produce a mapping from some IBM code pages to Unicode.
—cited from email: Suignard to Unicode, March 1, 1990
I think we really need to get more serious about having a meeting schedule determined in advance - announcing whole-day meetings over E-mail on just a few days' notice is unreal. I seem to recall Michel Suignard suggesting reserving one day out of each month (or two weeks, or some regular interval) and attempting to hav meetings on a *regular* basis, instead of just setting them up in an ad hoc fashion. Tha would be a lot easier.
—cited from email: Bosurgi (Claris) to Unicode, February 19, 1990
This is to confirm that RLG will be hosting the next Unicode meeting this Friday, Jan 26, in the conference room on the first floor called "OAK". (Ask the receptionist at the front desk for directions when you come in.)
I leave the actual agenda to others. Although we said we'd have a "full day meeting", RLG staff cannot attend before 9:30 am, so let's start then. (Wayne Davison, Associate Director for Development, will also be attending from RLG central staff, and will be there before me.)
—cited from email: Smith-Yoshimura to Unicode, January 22, 1990
I cannot make the Jan 26th meeting, but Asmus Freytag from my team will go and also eventually David Wood (National Language Support Program Manager). Unicode is getting a lot of attention in Microsoft and is clearly in good position to be used as an internal unique character representation.
—cited from email: Suignard to Unicode, January 17, 1990
At 7:00 pm on Friday, those present decided to extend the technical discussion of non-CJK and other issues to another meeting on Monday, 29 January. This meeting will held at Claris (5201 Patrick Henry Drive, Santa Clara), beginning at 5:00 pm. I will try to arrange for food, but given the short notice we might have to fax an order to Togo's.
—cited from email: Bosurgi (Claris) to Unicode, January 28, 1990
From ksar@hpcea.ce.hp.com Wed Jan 31 16:37:41 1990 To: microsoft!michelsu@Sun.COM Date: Wed, 31 Jan 90 8:43:38 PST Subject: Re: Re Compatibility Space Cc: unicode@Sun.COM In-Reply-To: Message from "michelsu@Sun.COM" of Jan 30, 90 at 7:49 pm [...] I like this idea of compatibility block in Unicode and it appears that it could resolve the impass we had at the last meeting. It does not have to be in the "primary space" but there is a need for it. The issue now is what to include in it. Why not start with what 10646 2nd DP has and provide feedback to SC2/WG2, through ANSI/X3L2, on what should be added or deleted from 2nd DP of 10646? Regards, Mike KSAR/HP
From lcollins@apple.com Fri Dec 15 17:24:00 1989 Date: Fri, 15 Dec 89 17:16:20 PST To: microsoft!michelsu@Sun.COM, unicode@Sun.COM Subject: Re: Dec 18th meeting Michel, Apple has always assumed that in the worst case we would have to go it alone with Unicode rather than accept a bad standard. If there is a vote to be taken, then we will opt for Unicode, since I doubt that we can rely on 10646. I think most of the Unicoders agree. We are counting on the final freezing of Unicode in early 1990. I think a formal meeting once a month may now be justified since we you and others have to come from outside the area. Please get us a list of the discrepancies in mapping and any missing symbols as soon as you can. Lee
From Joseph_D._Becker.osbunorth@Xerox.COM Fri Dec 15 19:09:44 1989 Date: 15 Dec 89 19:01:50 PST (Friday) Subject: Re: Dec 18th meeting From: Becker.osbunorth@Xerox.COM To: microsoft!michelsu@Sun.COM Cc: unicode@Sun.COM, Becker.osbunorth@Xerox.COM In-Reply-To: microsoft!michelsu%Sun:COM's message of 15 Dec 89 16:59:21 PST (Friday) Michel, [...] Re: meetings more formal with a longer agenda > Well, if/when we attract the money and personnel, we might envision genuine Unicode conferences, teach-ins, even the greatly-to-be-desired Unicoeds ... But as things stand, I think the main technical work will be established before we get formally organized. You have raised many interesting issues, so I'm glad you can at least participate by E-mail on days when you can't justify a trip down here. Until we get formal, we would certainly be glad to build a meeting around any other opportunity you might have to visit the Bay Area. Joe
From glennw@Sun.COM Thu Dec 21 15:11:41 1989 Date: Thu, 21 Dec 89 11:37:04 PST From: glennw@Sun.COM (Glenn P. Wright) To: microsoft!michelsu@Sun.COM Subject: Re: Dec 18th meeting Cc: unicode@Sun.COM I think we should still give a try to converge ISO 10646 and Unicode, but let's say after late March or early April we have to proceed. I agree with you. I assume here you mean "finish Unicode", followed by "Discuss merge issues". I don't think we have ANYTHING to say until we know we have a draft. I'm sure X3L2 are sick of listening to us bemoaning 10646 when what we have is not in final draft form. | Re: meetings more formal with a longer agenda No, I am not asking for full blown conference but more like a monthly meeting which may last a bit longer with a predetermined agenda. I agree. I believe we need to commit more time in January to stop the thrashing we seem to be doing. We urgently need to get written consensus on decisions. I propose that our next meeting be a whole day meeting sometime second or third week of January. I would like to see us close all issues on non-CJK (Including symbols) at that meeting. (yes, I know most work is done outside meetings, but....) Glenn.
From BOSURGI1@applelink.apple.com Thu Dec 21 19:41:50 1989 Date: 21 Dec 89 18:58:00 PST From: BOSURGI1@applelink.apple.com To: unicode@Sun.COM Subject: January Meeting Hi ... This is just to second Glenn's recent suggestion of an all-day meeting next month to finalize non-CJK for this revision. I think we really need to close these points as soon as possible, maybe even start some test implementations using Unicode, see how transmission of a subset of Unicode feels, etc., etc. That feedback would be valuable to get before finalizing the whole wad and allow us to bring any hidden flaws in our assumptions (especially in handling of diacritical marks) to the surface quickly. I think the problems we have been pointing out in 10646 have received more acknowledgement recently, and we should take the opportunity soon to start working with (as well as on) what we think is a much more viable standard, to demonstrate Unicode's relative merits concretely. It could be dramatic to compare development time, testing time, and/or execution time of some sample international routines which used Unicode, or a representative subset, with other possible methods. A cross-script, international "Find/Change" routine immediately comes to mind as an example. Yoi o-toshi-o y'all, Joe Bosurgi Claris Corporation Manager, Software Internationalization
From lcollins@apple.com Tue Nov 28 13:50:57 1989 Date: Tue, 28 Nov 89 13:26:55 PST To: unicode@Sun.COM Subject: Letter to ANSI Here is my first crack at the letter we discussed at the Unicode meeting last night. Please review it and let me know if it is acceptable to use your and/or your company names in the final draft to ANSI. Lee ------------------------------------ Subject: Flaws in ISO DP 10646 Status: Industry Group Position Action requested: Consideration by X3L2 As software and computer systems producers attempting to meet growing international requirements, we would like to express our concern with the unsatisfactory direction being taken in the development of the ISO multi-octet standard character code standard, DP 10646. The failure of ISO SC2 WG2 to incorporate the modifications to DP 10646 proposed in X3L2/89-195 threaten to render 10646 unacceptable as an internal process code. Specifically, we are concerned that DP 10646 is marred by three serious flaws: 1. DP 10646 places unnecessary restrictions on the encoding of graphic characters. This forces the use of 24 or 32 bit characters, even though a fully coded 16 bits is more than sufficient for representing all but the most obsolete and obscure of the world's characters. We do not find compelling the argument that this allows backwards compatibility with 7 and 8 bit terminals since it is obvious that existing hardware will require major revisions to adequately handle the character code repertoire made available by DP 10646. 2. If 10646 fails to establish unification of the Han characters, then it will not be possible to represent the standard Han characters used by Chinese and the proposed extensions for Japanese within the much sought after property of the 16-bit Basic Multilingual Plane. Any standard that discriminates against such a large segment of computer users is clearly unacceptable. 3. If 10646 is implemented to allow large numbers of presentational forms in the basic multilingual plane, this will confuse the highly desirable distinction between text content (character codes, the jurisdiction of SC2) and form (glyph identifiers, the jurisdiction of SC18). In practice, this will mean that we have to recognize multiple encodings of characters while gaining nothing since is neither possible, useful, nor practical for a character code standard to specify all possible presentation forms of a character. Moreover, it is irresponsible to allow large numbers of glyphs to be defined within a character code space already much reduced by the above restrictions on graphic character encoding and multiple encodings of Han characters. As a result of these flaws, unless we implement the extravagant and unnecessary 32 bit characters, we will be forced to live with the variable-width encodings, losing the advantages of a fixed-width encoding already noted in X3L2/89-195. Frankly, this result is no better than the current state of the world in multilingual computing. Moreover, it is not clear that 10646 even represents a major advance over ISO 2022. As a result, we foresee the development of a de-facto industry standard encoding based on fixed-width, 16 bit characters. ----------
From Joseph_D._Becker.osbunorth@Xerox.COM Fri Nov 17 12:04:37 1989 Sender: "Joseph_D._Becker.osbunorth"@Xerox.COM Date: 17 Nov 89 11:34:22 PST (Friday) Subject: 11/16 Decisions on Unicode Architecture & "Symbols" From: Becker.osbunorth@Xerox.COM To: Unicode@Sun.COM Those who, despite the untimely death of E-mail, actually attended yesterday's meeting held some pretty thorough pro-and-con discussions on the various issues, and made decisions summarized as follows: ARCHITECTURAL STATUS QUO'S: > NUL bytes: leave unassigned any codes ending in a 0 byte Decision: Status Quo, i.e. "no code ranges or byte values are systematically excluded from use". Reason: Unitext is a new type that can't be interpreted in any "8-bit mode" anyhow, so there's no gain in acceding to misguided 8-bit thinking. > C1 pullout: leave unassigned the "C1" range 0080-009F Decision: Status Quo, i.e. leave the 32 miscellaneous letters and stuff there. Reason: We might have been willing to concede if we thought that C1 had any widely-accepted standard semantics (as C0 does), but since we don't, there's no point in just leaving the space open. > Moving unassigned alpha-symbol space Decision: Status Quo, i.e. leave the overall allocations just as they are. Reason: We decided that we do NOT intend all code points after 5000 to be exclusively Kanjilando, i.e. we ARE willing to put non-Han characters at later code ranges as things overflow the currently-assigned regions. Then, for "symbolic characters" (see below), we might as well make the "didactic" assignment which states that there won't be more than 4K symbols that we approve of. We discussed doing the same thing for "alphabets" or non-CJK scripts (shrinking that block to 4K), but decided that in a character code standard it seemed appropriate to allocate a generous 8K up front to scripts. Hence, we arrive back at the current structure, but with a better feel for what we mean by it. > Nuke the umgels (new item): remove the absurd 2,500-4,000 prefab Korean umgels, [...] Decision: Not terminally decided, but so far willing to remain Status Quo. [...] ARCHITECTURAL CHANGE: > Adopt 8859 alphabet structures: restructure those alphabets that have national standard encodings to use the arrangement of those encodings insofar as possible Decision: For those alphabets where national standards exist, including the 8859/n sets, change back to using those arrangements. Existing holes will remain as holes, and a few new holes will be created when we zap out duplicates of characters that are already coded somewhere else. (By the way, for better or worse this is the identical approach taken by 10646, so we should look at the 10646 layouts as well as the old standards.) Our extension letters will be added afterwards, starting at the first available multiple of 16. We may also want to add a bit more expansion space after the end of some alphabets. Reason: It's just not sensible to enrage everyone in half a dozen nations for no particular technical gain. If they want to make a disaster area out of their own alphabet, that's their privilege. The position we took with ANSI & ISO is that we are NOT trying to design the world over from scratch because we're smarter than everyone else, but rather that we have a few sacred principles (e.g. 16 bit encoding) and are trying to weave together existing standards except where doing so would trash those principles (e.g. non-unified CJK). [...] "SYMBOLIC CHARACTERS": Ken suggested we abandon the word "Symbols" for the phrase "Symbolic Characters", which we liked as a more precise expression of what we intend. The list of criteria was expanded slightly, although it still needs work: Criteria for inclusion: > If the symbol itself has a name, e.g. "ampersand", "hammer-and-sickle", "one-snake caduceus" >If the symbol is commonly used amidst text, e.g. the Japanese ZipCode-san face that is on the inside cover of the JIS standard but not among the JIS standard symbols > If the symbol is widespread, i.e. actually found used in materials of diverse types/contexts by diverse publishers, including governmental (still need a more cogent statement of this) Criteria for exclusion: > If the symbol is MERELY a drawing (stylized or not) of something, e.g. this is intended to exclude pictures of cows, dragons, etc. > If the symbol is usually used in 2-Dimensional diagrams, e.g. circuit components, weather chart symbols > If the symbol is composable, e.g. a slash through some other symbol indicating negation, APL composites(?!) > If the symbol is recognized only by a small group of people, e.g. technical symbols for some special field ... analogous to the Buginese alphabet: these characters exist but are just not "common" enough, at least for Unicode 1.0 Joe
From glennw@Sun.COM Tue Nov 14 15:41:40 1989 Date: Tue, 14 Nov 89 13:02:40 PST From: glennw@Sun.COM (Glenn P. Wright) To: unicode@Sun.COM Subject: Unicode Consortium details Dear all. During the progress of the last two or three Unicode committee meetings we have been discussing the notion of turning the committee into a consortium. In following mail I will outline the proposed charter and ground rules for the consortium. I believe the proposed rules and charter for the consortium are roughly in-line with the wishes of the existing committee. I personally believe that the formation of this sonsortium is crtitical to the dispursal of information regarding the Unicode scheme. The consortium should allow us to have involvement from a broader range of organisations and regions. Please take time to review the mail that will shortly follow. Unless there are specific objections to the layout I will begin the process of identifying an organization, spokesperson and location for the Unicode consortium. In particular I would like to hear suggestions regarding other organisations and individuals that you feel should be added to the following list of interested parties: Unicode Interest, Electronic mailing list, to date: # James Higa NeXT # Paul Hegarty NeXT # Matt Morse NeXT # Lee Collins Apple # Joe Becker Xerox # Jackson Adobe systems # Tom Yap Sun Intercon # Rick Kwan Sun Intercon # Albert Fung Sun Intercon # Nelson Ng Sun # Bill English Sun # Teruhiko Kurosaka Sun Intercon # Karen Smith-Yoshimura Research Libraries Group # Mike Kernaghan Metaphor # Ken Whistler Metaphor/Berkeley # Erik Wendelboe HP # Wayne Krone HP # Mike Ksar HP # Gary Miller IBM # Joe Bogurgi Claris # Rick Mcgowan AT&T USO Japan # Hiromichi Kogure AT&T USO Japan # Doug Merritt Hunter Systems Glenn Wright ================================ Sun Microsystems 2550 Garcia Avenue Mountain View California CA 94043 USA. Tel (1) 415 336 6983 gwright@sun.com or {..sun}!gwright
From lcollins@apple.com Tue Nov 14 17:55:20 1989 Date: Tue, 14 Nov 89 17:52:21 PST To: unicode@Sun.COM Subject: Microsoft report I talked to Michel Suignard, who handles international code sets for Microsoft. He is very interested in Unicode, likes the separation of text content and form, and has taken the Unicode charts on a trip to Asia where they were well received even in Japan. He is faxing down a list of questions and concerns which I hope to be able to discuss at this week's meeting. He would like to be invited to future meetings (given time to make the travel arrangements). Apparently dealing with code pages is such a pain that some at Microsoft have understood the vision of Unicode. We could see it on a future version of OS/2. Lee