=========================================================================       
Date:         Thu, 1 Aug 1991 17:34:21 EDT                                      
Reply-To:     "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Sender:       "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
From:         schein@TOROLAB5.VNET.IBM.COM                                      
Subject:      New version of C0 contribution from Tom Hastings - Digital        
                                                                                
                                    ISO                                         
                                                                                
               INTERNATIONAL ORGANIZATION FOR STANDARDIZATION                   
                                                                                
                ORGANISATION INTERNATIONALE DE NORMALISATION                    
                                                                                
                     Multiple-Octet Coded Character Set                         
                                                                                
                                                                                
                           ISO-IEC JTC1/SC2/WG2 N                               
                                                                                
                             Date: 27-July-1991                                 
                                                                                
                                                                                
     This paper is in response to the proposal that the second ISO DIS          
     10646 adopt the approach of using the C0 and C1 space for coding           
     graphic characters (and then unifying the coding of the two                
     standards).  There is a summary at the end.                                
                                                                                
     Since the first version of this paper, dated 16-July, I've added           
     comments and suggestions (indicated by change bars) İI took the            
     change bars out of this electronic version, because they messed up         
     the formating.  They are in the paper copy that you will get.              
     -Avery¨:                                                                   
                                                                                
     1.  Fold C1 characters also, since some mail systems remove them.          
                                                                                
     2.  Indicate that escape sequences (control sequences and control          
         strings from ISO 6429 too) can be used in ISO 10646 without            
         interior NULL padding, only initial and final NULL padding.            
                                                                                
     3.  Add ESC 2/5 2/15 F, ESC 2/0 F, and ESC Fs announcer                    
         alternatives to using HOP (hex 81).                                    
                                                                                
     4.  Add parameters to HOP announcer alternatives.                          
                                                                                
     5.  Remove announcer for Little Endian, since data should be in a          
         single standard order for interchange and in programs; byte            
         swapping should happen when interchange data is read and written       
         on Little Endian machines.                                             
                                                                                
     6.  Change terminology from "one-octet form" to "current ISO 2022          
         form", since ISO 2022 data can be two-byte, such as in current         
         ideographic character sets.  Change bars not used, since not a         
         substantive change.                                                    
                                                                                
     1  Relationship to current coding standards                                
                                                                                
     At first blush, this seems to be a highly in-compatible proposal.          
     However, another way of looking at the proposal is that it                 
     expands the fundamental (ISO 2022) coding unit used in character           
     coding from being only one (7-bit or 8-bit) byte to also being             
     two and four octets as well.  With this view, each character in            
     current standards can be thought of being coded in one, two, or            
     four octets, depending on the form of coding.  When a current ISO          
     2022 conforming standard is represented in this expanded two or            
     four-octet form, the high-order, mostsignificant octet(s) are              
     zero (corresponding to the ASCII/ISO 2022 NULL character when              
     viewed in current ISO 2022 form).  Thus the C0 and C1 control              
     standards are preserved and can be used in this expanded form with         
     each C0 (8-bit) bit combination being represented in two octets            
     with the first octet zero.  Converting some software programs in           
     some programming languages may consist solely of specifying that           
     all character data is two or four octets, instead of one-octet,            
     and recompiling.  This may be worth SC2 pointing out to SC22.              
     Finally, with this Unicode approach, 65/65536 of the code space is         
     used for coding control functions which is less than 0.1%,                 
     compared to the current ISO DIS 10646 which uses 44% for coding            
     control functions.                                                         
                                                                                
     In fact, ISO/IEC JTC1 SC2 could consider the expansion to twoand           
     fouroctet forms as an addition to ISO 2022 by specifying that an           
     ISO 2022 bit combination can be two or four octets, not just one           
     7or 8-bit byte, but more on that later.                                    
                                                                                
                                    NOTE                                        
                                                                                
     The reason to consider the expansion as two and four octets,               
     rather then 16 and 32-bits, is 1) to avoid the confusion of Big            
     Endian vs.  Little Endian, 2) to make it clear how the data maps           
     onto oneoctet transmission channels and storage media, and 3)              
     all protocol standards are octet based.                                    
                                                                                
     1.1  Representation of escape sequences, control sequences, and            
          control strings                                                       
                                                                                
     The representation of escape sequences of ISO 2022 and ISO 6429            
     and the control sequences and control strings of ISO 6429 would            
     not need to require NULL padding of each bit combination in the            
     sequence of bit combinations that follows the introducer                   
     character.  This is because ISO 2022 and ISO 6429 indicate that            
     the bit combinations that follow (hex range 20 to 7E) are to be            
     interpreted independently of the graphic character set(s)                  
     currently designated and invoked.  Only initial NULL padding of            
     the introducer character (ESC, CSI, DSC, OSC, PM, etc.) and NULL           
     padding after the last octet to align the next character on a              
     character boundary would be required.                                      
                                                                                
            Example: the escape sequence ESC 6/0 (hex 1B60) would be            
            represented as:                                                     
                                                                                
              In 2-octets:  001B 6000                                           
              In 4-octets:  0000001B 60000000                                   
                                                                                
            Example: the escape sequence ESC 2/5 4/0 (hex 1B2540) would         
            be represented as:                                                  
                                                                                
              In 2-octets:  001B 2540                                           
              In 4-octets:  0000001B 25400000                                   
                                                                                
     Longer escape and control sequences and control strings approach           
     the efficiency of current ISO 2022 representation.                         
                                                                                
             LET'S NOT ASSIGN GRAPHIC CHARACTERS TO ROW HEX 1B                  
                                                                                
     In order to reduce the possible confusion of ISO 10646M charac-            
     ters with escape sequences, lets skip assigning characters to row          
     hex 1B for now, though 1B appears as the second octet of many              
     characters, so there will still be confusion with existing                 
     devices.                                                                   
                                                                                
     1.2  Coding of ISO 10646M controls: PAD, SGCI, HOP, IUCS                   
                                                                                
     The current ISO DIS 10646, uses up 3 precious C1 control character         
     positions (see ISO 6429).  They are PAD (hex 80), HIGH OCTET               
     PRESET (HOP = HEX 81) and SINGLE GRAPHIC CHARACTER INTRODUCER              
     (SGCI = hex 99).                                                           
                                                                                
     Since PAD can now be done with NUL (hex 00), we don't need PAD.            
                                                                                
     Since SGCI would need to have C0 and C1 bit combinations following         
     to represent many ISO 10646M graphic characters, SGCI cannot be            
     used in current ISO 2022 coding; therefore, we can code SGCI in            
     ISO 10646M using one of the code extension functions code                  
     positions: SO (hex 0E), SI (hex 0F), SS2 (hex 8E), or SS3 (hex             
     8F), since they cannot be used from within ISO 10646M for ISO 2022         
     code extension.  Any of the other C0 or C1 control characters              
     could be used from within ISO 10646M, so we don't want to preclude         
     that by using those bit combinations in row 0 of ISO 10646M.               
                                                                                
     The IDENTIFY UNIVERSAL CHARACTER SUBSET (IUCS coded as an ISO 6429         
     control sequence: CSI Ps...  02/00 6/13 which is hex 9B Ps...              
     20 6D) is used to identify subsets.  We can continue to use it if          
     it is useful outside ISO 10646, or we could use one of the code            
     extension code points: 000E, 000F, 008E, 008F.                             
                                                                                
     2  Interworking/co-existing with existing equipment and software           
                                                                                
     In order to decide whether we can change ISO DIS 10646 to use the          
     Unicode approach and coding, we have to answer the following               
     three questions:                                                           
                                                                                
                                                                                
                                                                                
           1.  What problems will using the C0 and C1 space for graphic         
           characters in ISO 10646 cause when such data is used with            
           existing equipment and existing software İthat only views            
           character data in current ISO 2022 form and so may look              
           for any C0 or C1 octet and take some action¨?                        
                                                                                
           2.  In order to be fair, we also need to ask the same                
           question about the current ISO DIS 10646: "What problems             
           will using the 02/00 (hex 20) octet (= ASCII SPACE                   
           character) for graphic characters in ISO 10646 cause when            
           such data is used with existing equipment and existing               
           software İthat only views character data in current ISO 2022         
           form and may look for the SPACE (hex 20) octet and take some         
           action¨?                                                             
                                                                                
           3.  We also need to consider future equipment and software           
           that may wish (or need) to support the current ISO 2022 form         
           as well as the new ISO 10646M two and four-octet forms in            
           order that new ISO 10646M data and programs can be                   
           introduced into existing existing systems in an evolutionary         
           way.                                                                 
                                                                                
           Existing equipment includes:                                         
                                                                                
           Terminals                                                            
           Printers                                                             
           Modems                                                               
           Terminal/printer concentrators                                       
                                                                                
               Host connections                                                 
                                                                                
            Existing software includes:                                         
                                                                                
               Operating Systems:                                               
                                                                                
                  Terminal and printer I/O drivers                              
                  Command language interpreters                                 
                  Call interfaces                                               
               File Systems                                                     
               Compilers                                                        
               Application programs                                             
                                                                                
     3  Terminals and Printers                                                  
                                                                                
     One area that this new coding form will have a major impact is             
     terminals and printers that use asynchronous, serial full-duplex           
     and halfduplex lines to connect to modems, concentrators, and              
     hosts.  There are the following possibilities for system                   
     connections using serial-lines:                                            
                                                                                
     3.1  Output to existing terminals and printers                             
                                                                                
     Obviously, existing terminals and printers cannot be expected to           
     send/receive the new twoand four-octet data.   However,                    
     terminals and printers that receive ASCII and ISO 8859-1 one-octet         
     data would be able to receive and image the proposed ISO                   
     10646/Unicode twoand four-octet data forms that are represented            
     by row 0, since the terminals and printers ignore NULL (hex 00)            
     octets.                                                                    
                                                                                
     While existing terminals and printers ignore NULL (hex 00) when            
     received in graphic character data and within single byte                  
     control characters, I'm not so sure about embedded NULLs in ISO            
     6429 escape sequences and controls sequences.   However, there             
     isn't a need to embed the NULLs in the middle of such sequences            
     (see Section 1.1).                                                         
                                                                                
     Existing terminals and printers interpret a NULL (or 3 NULLs)              
     preceding the single character format effectors: HT, CR, LF, VT,           
     FF, and BS correctly.  Even NULL CR NULL LF is interpreted as CR           
     LF on existing terminals and printers.                                     
                                                                                
     Existing asynchronous full-duplex serial-line terminals and                
     printers also correctly interpret received NULL XOFF (DC1 = hex            
     11) and NULL XON (DC3 = hex 13) used for flow control (see below).         
                                                                                
     These same existing ASCII and ISO 8859-1 terminal and printers             
     receiving ISO DIS 10646 data would image one or three SPACEs               
     between each character.                                                    
                                                                                
     So for output of the ISO 8859-1 subset, the Unicode approach is            
     more compatible with existing terminals and printers than the 1st          
     ISO DIS 10646 coding.                                                      
                                                                                
     3.2  Output to future terminals and printers                               
                                                                                
     The need for a standardized announcer for switching from current           
     ISO 2022 form to twoor four-octet forms is vital for output to             
     new terminals since implementation of the terminals is often               
     done by different vendors than the concentrators, modems, and              
     host systems.  İWe will see later that an announcer is also vital          
     for interchange of file data for software use and for checking             
     that the data is as expected for processing (or do code                    
     expansion/reduction), so we might as well use the same announcers          
     there too.¨                                                                
                                                                                
     3.3  Input from existing terminals                                         
                                                                                
     Existing terminals input only in current ISO 2022 form and do not          
     announce that.                                                             
                                                                                
     3.4  Input from new terminals                                              
                                                                                
     New twoand four-octet terminals will probably want to operate in           
     a one-octet ASCII, ISO 8859-1, or other ISO 2022 form (including           
     two byte sets) as well.  Therefore, terminal concentrators,                
     modems, and host systems will want to be able to switch between            
     current ISO 2022, two, and four-octet forms of terminal input.             
     Often this switching will happen during a session.  The beginning          
     of the session would likely start up in current ISO 2022 form and          
     switch to two or four octet form if both parties agree and support         
     that.                                                                      
                                                                                
     A new terminal sending two or four-octet data to existing                  
     operating system software such as command line interpreters and            
     text editors, would probably still work correctly if the user              
     limited himself to row 0 (ASCII or ISO 8859-1 characters) with             
     Unicode coding, since most such software probably filters out              
     NULLs.  However, application software might NOT work so well, that         
     depends on the run-time library handling of NULL octets.  However,         
     I predict that most new Unicode/ISO 10646 terminals that support           
     serial-lines will have a user controlled mode to run in current            
     ISO 2022 form as well, so that they can be connected to existing           
     systems as well.                                                           
                                                                                
     3.5  Input from existing printers                                          
                                                                                
     Existing ISO 6429 and other character oriented serial-line                 
     full-duplex printers input limited amounts of status information,          
     if enabled.  PostScriptİ1¨ printers send arbitrary amounts of              
     input data in ASCII and need flow control on input as well as              
     output.                                                                    
                                                                                
          İ1¨ PostScript is a registered trademark of Adobe Systems, Inc.       
                                                                                
                                                                                
     3.6  Input from future terminals need announcers                           
                                                                                
     The need for a standardized announcer for switching from current           
     ISO 2022 from to twoor four-octet forms is vital input from new            
     terminals since implementation of the terminals is often done by           
     different vendors than the concentrators, modems, and host                 
     systems.  İWe will see later that an announcer is also vital for           
     interchange of file data for software use and for checking that            
     the data is as expected for processing (or do code                         
     expansion/reduction), so we might as well use the same announcers          
     there too.¨                                                                
                                                                                
                                                                                
                                                                                
     4  Modems and Terminal Concentrators                                       
                                                                                
     4.1  Asynchronous serial-line communication                                
                                                                                
     Full-duplex, serial-line modems and terminal concentrators pass            
     the character data through in both directions.  All data is passed         
     through transparently, except for the C0 octets used for flow              
     control: DC1 (hex 11) and DC3 (hex 13), called XOFF and XON                
     (Control Q and Control S).                                                 
                                                                                
     Also most concentrators and modems pass NULLs through.  This is            
     becoming increasing necessary, since the IBM PRO printer has               
     embedded NULLs in its escape sequences.  This will help get                
     Unicode data through existing concentrators and modems.                    
                                                                                
     5  XOFF/XON Flow Control on asynchronous serial full-duplex lines          
                                                                                
     The XOFF/XON flow control problem is perhaps the biggest                   
     impediment to using the Unicode encoding technique.                        
                                                                                
     Unicode coding uses XOFF (hex 11) and XON (hex 13) octets as the           
     second octet for a number of graphic characters, starting with             
     Cyrillic. Hex 11 and hex 13 hasn't been assigned as a first octet          
     for any characters yet in Unicode.  It may be well to hold off             
     allocating characters to hex rows 11 and 13 for a while.                   
                                                                                
     5.1  Background                                                            
                                                                                
     The XOFF/XON flow control technique on asynchronous full-duplex            
     serial lines permits the recipient (terminal/printer or                    
     modem/concentrator) to stop the sender (model/concentrator or              
     terminal/printer) temporarily, if the sender sends too much                
     character data for the receiver to keep up.  The recipient send            
     the XOFF down the other line to stop the sender.  For terminals it         
     works in both directions.  For printers, except PostScript                 
     printers, the flow control is only needed for the printer to               
     restrain the sender.  For PostScript printers which can send an            
     arbitrary amount of data, the host needs to be able to restrain            
     the printers input as well.                                                
                                                                                
     The XOFF/XON flow control technique is the least expensive method          
     for implementing flow control in the serial-line asynchronous              
     market.  It does not require any additional wires.  Wiring in              
     buildings is typically four wires, as is phone company practice.           
     Thus existing wiring in buildings can implement a single                   
     full-duplex connection.                                                    
                                                                                
     Another method for flow control over serial-lines is to use two            
     additional signals: DTR and DSR for flow control.  This takes              
     two additional wires.  Asynchronous printers often offer the               
     customer both methods when connecting up his printer.  However,            
     the video terminal market has not done this since low cost is              
     even more important and they are more often connected to wiring in         
     a building.  Printers tend to be installed near to the host,               
     concentrator or modem, so the extra cost of more wires isn't a             
     factor.                                                                    
                                                                                
     Another alternative rarely used today, is to implement a full              
     handshaking packetized protocol over the full-duplex                       
     serial-line.  The sender and receiver use the protocol to control          
     their rate of flow.                                                        
                                                                                
                                  QUESTION                                      
                                                                                
     Is there an ISO, Internet, or other standard for a protocol for            
     use over serial, full-duplex lines?                                        
                                                                                
     Some answers: Internet Mail is working on some techniques: uuen-           
     code, btoa, and binhex (see attached mail from Greg Vaudreuil,             
     Chairman ITEF SMTP Extensions Working Group).                              
                                                                                
     5.2  Possible solutions to handling the XOFF/XON problem on serial         
     lines                                                                      
                                                                                
     There are a number of solutions to avoiding the use of single              
     octet hex 11 and hex 13 with ISO 10646/Unicode data on                     
     serial-lines:                                                              
                                                                                
     5.2.1  Use twoand four-octet XOFF and XON                                  
                                                                                
     One approach for new terminal, printers, concentrators, modems and         
     hosts is to use the twoand four-octet XOFF and XON controls when           
     operating in twoand four-octet form.  This requires the use of             
     an announcer to indicate when twoor four-octet form is being               
     used in each direction and when returning to current ISO 2022              
     form.                                                                      
                                                                                
     5.2.2  Use additional wires                                                
                                                                                
     Use out of band flow control with additional wires (not possible           
     in many circumstances), but a good alternative when it can be used         
     (especially printers).  This may become a popular method, but              
     needs a standard (defacto or de-jure), so that terminal/printer            
     manufacturers can connect to concentrator/modem manufacturers              
     equipment.                                                                 
                                                                                
     5.2.3  Use a real protocol on the serial-line                              
                                                                                
     Needs standards here.  Are there ones in existence?                        
                                                                                
                                                                                
     5.2.4  Byte stuff hex 11 and hex 13 when it occurs in data                 
                                                                                
     Another approach is for the sender to convert any hex 11 and hex           
     13 data, to something else, so that the existing equipment won't           
     think an XOFF or XON is being sent.  The receiving equipment               
     converts it back.                                                          
                                                                                
     Byte stuffing can be implemented in hardware (or by hardware               
     assist which interrupt on particular bit patterns) or completely           
     in host operating system, run-time library or even application             
     program software.                                                          
                                                                                
                                                                                
     5.2.4.1  BISYNCH byte stuffing algorithm                                   
                                                                                
     One alternative would be to use IBM's BISYNCH method using a               
     transparency mode entered by DLE STX, left by DLE ETX.  Real XON           
     and XOFF would be sent as usual; data that looks like XON and XOFF         
     and DLE would be sent as DLE XON and DLE XOFF and DLE DLE.                 
                                                                                
     We might use the HOP code to get into two or four-octet mode that          
     requires transparency, instead of using DLE STX and DLE ETX (see           
     Section 7).  I strongly recommend that the ISO 10646 standard              
     include an informative annex that recommends a particular byte             
     stuffing algorithm so that XOFF (hex 11) and XOFF (hex 13) can be          
     used for flow control as single octets as is current practice.             
                                                                                
                                   ISSUE                                        
                                                                                
     Are there other byte stuffing algorithms?  Any ISO standard ones?          
                                                                                
     5.2.5  Fold graphic two-octet data out of C0, SP, DEL, C1, FF              
     space                                                                      
                                                                                
     Another approach that can be used, more typically in software and          
     new Unicode/ISO 10646 terminals and printers, is for the sender to         
     convert the Unicode/ISO 10646 data about to be sent so that is             
     isn't confused with current data.  The Apple proposal from Mark            
     Davis, Rick Sewill, and Rob Hawley seems a good one (reproduced            
     here with slight modification as indicated), though we need to             
     extend it to four-octet ISO 10646 data as well İI didn't do this           
     yet.¨.  It meets the following properties:                                 
                                                                                
     1.  The pervasive C0 and ASCII characters are sent as one-octet            
     data compatible with existing standards, equipment, and software.          
                                                                                
     2.  The new Unicode/ISO 10646 graphic characters above hex 007F            
     are folded so that they do not use hex 00..1F (C0), SPACE (hex             
     20), DEL (hex 7F), C1 (hex 80..9F), or hex FF (sometimes thrown            
     away as if DEL) and are sent in two or three octets.                       
                                                                                
     The algorithm is:                                                          
                                                                                
     1.  Map Unicode/ISO 10646 characters 0000..007F to 00..7F.  İI             
     didn't include mapping the C1 characters (0080..009F to 80..9F),           
     because some mail systems remove C1 characters (see attachment             
     from IETF SMTP Extensions Working Group Chairman).  I also didn't          
     include mapping hex 00FF to FF, since 00FF is Unicode/Latin-1              
     SMALL LATIN LETTER Y WITH DIAERESIS and some existing                      
     communication systems and/or software confuse FF with DEL (7F) and         
     remove it.¨ Then Unicode/ISO 10646 C0, SPACE, ASCII graphics, and          
     DEL characters are represented as current one-byte C0, SPACE,              
     ASCII graphics, and DEL characters and so can pass through                 
     existing software and communications channels that assumes C0,             
     SPACE, ASCII graphics, and DEL characters.                                 
                                                                                
     2.  Map the next 93*179=16,647 Unicode/ISO 10646 characters                
     starting with hex 00A0 into two octets in which the first octet is         
     in the range hex A0..FC (93 possible values) and the second octet          
     is in the range hex 21..7E, A0..FC (179 possible values).                  
                                                                                
     3.  Map the remaining 2*179*179=64082 Unicode/ISO 10646 characters         
     into three octets in which the first octet is hex FD..FE, the              
     second and third octets is hex 21..7E, A0..FC (179 possible values         
     each).                                                                     
                                                                                
     I strongly recommend that the ISO 10646 standard include a second          
     normative annex that recommends this particular folding that can           
     be used to transform Unicode/ISO 10646 data into a form that can           
     get it by existing hardware and software.  By having two                   
     recommendations in two annexes, implementors will chose among              
     them (or do both), rather than having a proliferation of single            
     vendor, implementor workshop agreements, or various consortia              
     solutions.  These annexes would NOT be required for conformance            
     of interchange or equipment.                                               
                                                                                
     The announcer technique could flag that this folded data follows.          
     Then it wouldn't require prior agreement between sender and                
     receivers whether the data was being folded or not.                        
                                                                                
     6  Host connections                                                        
                                                                                
     Host connections include connecting to concentrators and modems            
     and directly to terminals and printers.  Hosts connected to                
     concentrators or modems control their flow with "out of band"              
     methods, rather than using XOFF/XON.  Hosts that connect directly          
     to the terminal or printer with a serial-line, have the same               
     problems that a concentrator or modem has when connecting to the           
     terminal or printer with a serial line (see above).                        
                                                                                
     7  Alternatives for Announcers to indicate two or four-octet form          
                                                                                
     Announcers are needed to flag data, whether interchanged on                
     communication lines or as complete files.  The use of announcers           
     with fields of records is probably not done, since fields are              
     usually declared as to data type (e.g., which character set) when          
     the record is declared.                                                    
                                                                                
     PostScript has used an announcer on all platforms, consisting of           
     the two ASCII characters: PERCENT (%) EXCLAMATION MARK (!).                
     Printing system then distinguish PostScript data from ordinary             
     (or other) text, if the first two characters of the file are %!.           
     This has been an invaluable technique for introducing PostScript           
     into existing systems.  ISO 10646 should use a similar technique           
     to announce two-octet form and four-octet form.  It is desirable           
     to also have a method to return to current ISO 2022 form after             
     having entered two or four-octet form, i.e., return to character           
     sets that conform to the current ISO 2022 code extension standard.         
                                                                                
                                                                                
                                                                                
     7.1  Requirements for announcers                                           
                                                                                
     The announcer mechanism must meet the following requirements:              
                                                                                
     1.  The announcer mechanism must distinguish the following types           
     of interchange data:                                                       
                                                                                
          1.  ISO 10646 two-octet data BMP (4 needed)                           
                                                                                
              a.  Whether or not non-spacing accents are used (to               
              dynamically compose characters).                                  
                                                                                
              b.  Whether or not SINGLE GRAPHIC CHARACTER INTRODUCER            
              (SGCI) is used to select single characters in other               
              planes.                                                           
                                                                                
          2.  ISO 10646 four-octet data (1 needed)                              
                                                                                
          3.  return from ISO 10646M data to existing ISO 2022 coded            
          character sets (including ANSI C multi-byte, and ISO 2022             
          one-byte and multi-byte sets, EUC, etc).                              
                                                                                
          4.  two octet-compaction in which the ideographic zone of the         
          specified plane of group 00 replaces the corresponding                
          ideographic region of the BMP to form two-octet data (4 * 10          
          or so needed)                                                         
                                                                                
              a.  Whether or not non-spacing accents are used (to               
              dynamically compose characters).                                  
                                                                                
              b.  Whether or not SINGLE GRAPHIC CHARACTER INTRODUCER            
              (SGCI) is used to select single characters in other               
              planes.                                                           
                                                                                
                     NOTE NO NEED FOR SOME COMPACTION METHODS                   
                                                                                
              One-octet compaction (using C0 and C1 coding assignments)         
              no longer yields any national or ISO standards, except            
              ISO 8859-1, so I did not list that as an announcement             
              requirement here.                                                 
                                                                                
              Three octet compaction does not seem to be needed, since          
              reaching out to get seldom used characters can be done            
              from two-octet form using SGCI when using that form or            
              use full four-octet form, so I did not list that as an            
              announcement requrement here.                                     
                                                                                
          5.  Possibly that "folded" data follows (see section Section          
          5.2.5) (1 needed in combination with all of the above)                
                                                                                
     2.  The announcer mechanism must be unambiguously interpretable in         
     all 3 forms (current ISO 2022, two-octet ISO 10646M, and                   
     four-octet ISO 10646M forms) or, alternatively, all data                   
     (CC-data-elements) is assumed to start out in current ISO 2022             
     form.                                                                      
                                                                                
     3.  The announcer mechanism must also meet the ISO/ANSI C                  
     programming language standard requirements for so-called                   
     Multi-byte data, in which the data stream is assumed to start out          
     in one-octet ASCII and then switch to any other form.  (The switch         
     can happen immediately as the first data, so any announcer that            
     is interpretable in one octet form suffices here).                         
                                                                                
                                    NOTE                                        
                                                                                
     Requirement 2 above means you can read the first two-octets or             
     first four-octets as a unit if that is what you expect and simply          
     check that the announcer is as you expect, else its an error               
     condition or requires code conversion.                                     
                                                                                
     4.  The announcer must occupy a multiple of two or four octets,            
     depending on whether twoor four-octet form is being announced.             
                                                                                
     7.2  Use of announcers for conforming interchange                          
                                                                                
     These announcers are required for conforming interchange of files          
     or over communication lines, unless there is a higher level                
     protocol, such as an OSI or SMTP protocol, record description, RPC         
     data description, etc., or unless there is prior agreement.                
     However, prior agreement precludes so-called blind interchange.            
                                                                                
     The announcer(s) could be used merely as a check that a program            
     was opening a file that was in the anticipated form or could be            
     used to dynamically convert from one or more types to the desired          
     types, depending on the design of the system run-time (likely to           
     vary between different programming languages).                             
                                                                                
     On communication lines, the announcers are used to indicate the            
     form of following data.                                                    
                                                                                
     In closed systems that are entirely ISO 10646/Unicode, the use of          
     the announcer would be optional.  However, if such a system                
     interchanged its data with other types of systems, it must include         
     the announcer, whether the interchange was through files or                
     communication lines, unless there is a high level protocol, or             
     unless there is prior agreement.                                           
                                                                                
     7.3  Alternatives proposals for the announcer mechanism                    
                                                                                
     There are several approaches for encoding announcers:                      
                                                                                
     1.  Use the C1 HOP (hex 81) that the current ISO DIS 10646 uses            
     for announcing with one or two octets immediately following                
     being the parameters.                                                      
                                                                                
     2.  Use ESC Fs sequences and ESC I Fs sequences, where the Fs              
     characters are assigned by the ISO Registrar, where I is in the            
     range hex 20..2F and Fs is in the range 60..7E.                            
                                                                                
     3.  Use ESC 2/0 F announcers from ISO 2022                                 
                                                                                
     4.  Use ISO 2/5 2/15 F complete code designators from ISO 2022.            
                                                                                
     SC2 should pick one of the following alternatives or another one           
     that meets the requirements.                                               
                                                                                
     7.3.1  Alternatives using HOP                                              
                                                                                
     The following alternatives use the C1 HOP (hex 81) that the                
     current ISO DIS 10646 uses for announcing with some number of              
     following bytes being parameters, indicating two-octet BMP, using          
     using non-spacing accents, using SGCI, two-octet compaction, four          
     octet, etc.                                                                
                                                                                
     7.3.1.1  Alternative 0: one-octet HOP character with one parameter         
                                                                                
     1st octet: not 81 existing ISO 2022 data follows 1st-2nd octets:           
     81xx any of the twoor four-octet forms, SGCI used, non-spacing             
     accents used, depending on the value of xx; a limited range of             
     xx specifies which plane's ideographic zone replaces the                   
     corresponding ideographic zone of the BMP.                                 
                                                                                
     7.3.1.2  Alternative 1: two-octet HOP character with two parameter         
     octets                                                                     
                                                                                
     1st-2nd octets: not 0081 existing ISO 2022 data follows 1st-4th            
     octets: 0081 xxyy any of the twoor four-octet forms, SGCI                  
     used, non-spacing accents used, depending on the value of xx; a            
     limited range of yy specifies which plane's ideographic zone re-           
     places the corresponding ideographic zone of the BMP.                      
                                                                                
                                RESTRICTIONS                                    
                                                                                
     Alternative 0: the first octet must be looked at separately from           
     the second octet; therefore, alternative 0 cannot be used in the           
     middle of data to switch to another form, only at the beginning.           
                                                                                
     Alternative 1: Group 00 plane 81 shouldn't have any graphic char-          
     acters assigned to it, so that it wouldn't be confused with the            
     announcer.                                                                 
                                                                                
                                                                                
     7.3.2  Alternatives using ESC 2/0 F or ESC 2/5 I F                         
                                                                                
     An alternative approach would be to use some ISO 2022 announcer            
     escape sequences of the form ESC 2/0 F, where different F values           
     indicate two-octet BMP, using using non-spacing accents, using             
     SGCI, twooctet compaction, four octet, etc.  Possibly need to              
     use ESC 2/0 I F, since may need 20 to 40 different forms.  As with         
     the HOP alternatives, either require an initial pad of NULL or             
     require scanning an octet at a time at the beginning (and don't            
     assign characters to row 001B in BMP).  Example:                           
                                                                                
 Alternative 0:  1B20 xx00     and     1B20 20xx                                
 Alternative 1a: 001B 20xx     and     001B 2020 xx00 (when going to2-octets)   
 Alternative 1b: 001B 20xx     and     001B 2020 xx00 0000 (when going          
     to 4-octets)                                                               
                                                                                
     A second alternative approach would be to use the ISO 2022                 
     invocation of a complete code (which ISO 10646M certainly is)              
     using the ESC 2/5 2/15 F sequences, where different F values               
     indicate two-octet BMP, using using non-spacing accents, using             
     SGCI, two-octet compaction, four octet, etc.  Example:                     
                                                                                
              Alternative 0:  1B25 2Fxx                                         
              Alternative 1a: 001B 252F xx00        (when going to 2-octets)    
              Alternative 1b: 001B 252F xx00 0000   (when going to 4-octets)    
                                                                                
                                    NOTE                                        
                                                                                
     Even if ISO 10646M were to use the ESC 2/5 4/0 sequence to re-             
     turn to ISO 2022 (see section Section 7.4.2), we can't use the ESC         
     2/5 F sequence to invoke ISO 10646M, because graphic character             
     data could accidentally look like ESC 2/5 4/0.  This is because            
     ISO 10646M has already assigned a graphic character to hex 2540            
     and lots of graphic characters have the second octet hex 1B (ESC).         
     ISO 10646M has to use initial padding of the first octet of ESC            
     2/5 4/0 to return to ISO 2022 (i.e., hex 001B 2540) in order to            
     avoid the conflict with graphic characters.                                
                                                                                
     7.4  Character Synchronizing                                               
                                                                                
     An announcer must occur at the boundary of a two-octet or                  
     four-octet character, not in the middle.  Thus the sender must             
     know which form the data is (one, two or four-octet) when                  
     inserting the announcer.                                                   
                                                                                
     Can anyone think of an announcer scheme that is self-synchronizing         
     for use when the sender isn't sure of the form of data or where            
     the character boundary is?                                                 
                                                                                
     Perhaps sending four hex 00 octets before the announcer would be           
     sufficient for the recipient to start to look for the hex                  
     00000081 announcer pattern.  If it doesn't occur, the data is              
     treated normally. If the sender knows what the form of data is and         
     where the character boundaries are, the extra four hex 00 octets           
     need NOT be sent.  So for example, at the beginning of a file or           
     at the beginning of a communication session, there is no need              
     for the extra four hex 00 octets in front of the announcer for             
     synchronization (00 HOP or 000000 HOP) since it is clear what the          
     form is and that we are at a character boundary.                           
                                                                                
     When files are concatenated (for example, UNIX pipes), the                 
     announcer might occur in the middle of the resulting data.                 
     However, the concatenator might have to convert the data to that           
     expected by the program anyway, so the announcers in the middle            
     would disappear (or be treated as no-ops, if they are the same as          
     the original errors if different from the original).                       
                                                                                
     7.4.1  Little Endian data announcement                                     
                                                                                
     In order to ensure data portability, data must be interchanged in          
     a single standard order, namely with the most significant octet            
     first. For programming languages that choose to represent ISO              
     10646M characters as integers, such as C, their run-time                   
     libraries on Little Endian systems can swap the bytes when                 
     reading and writing data that could be interchanged.  Other                
     languages can chose to represent ISO 10646M characters as two              
     octet strings which both Big Endian and Little Endian store with           
     most significant octet first; these languages need never swap              
     bytes and can interchange data on the same system with C                   
     produced/consumed data.                                                    
                                                                                
     7.4.2  Alternatives for switching back to current ISO 2022 form            
                                                                                
     We need a way for a data stream to switch back to the default              
     current ISO 2022 data form of current systems, i.e., to current            
     character sets conforming to the current ISO 2022, including               
     so-called multi-byte character sets.                                       
                                                                                
     If ISO 10646/Unicode is registered as a complete code according to         
     ISO 2022 and ISO 2375, then it is desirable to have a way to get           
     back.                                                                      
                                                                                
     Alternatives:                                                              
                                                                                
         1.  Use the existing ESC 2/5 4/0 escape sequence specified in          
         ISO 2022 (see ISO 2022 clause 6.3.11) to return to ISO 2022            
         conforming representation.  This escape sequence would be              
         used to switch back to current ISO 2022 form, i.e., to current         
         ISO 2022 conformant character sets that include oneand                 
         multi-byte character sets.                                             
                                                                                
         The escape sequence ESC 2/5 4/0 would be represented in                
         two-octet form as hex:                                                 
                                                                                
                 2-octets:    001B 2540                                         
                 4-octets:    00000011 25400000                                 
                                                                                
         2.  A particular parameter value of the HOP announcer sequence         
         indicates return to ISO 2022:                                          
                                                                                
                 2-octets:    0081 xxyy                                         
                 4-octets:    00000081 xxyy0000                                 
                                                                                
     8  Revising ISO 2022 to include two-octet and four-octet forms             
                                                                                
     I recommend that ISO 2022 be revised after ISO 10646 is approved           
     to include the expansion idea to two and four octets with ISO              
     10646 given as the only standard for its use.  The ISO 10646               
     announcers would be included.                                              
                                                                                
     9  Summary                                                                 
                                                                                
     This paper proposes the following for the 2nd ISO DIS 10646                
     regarding the C0 space:                                                    
                                                                                
     1.  Its OK for the 2nd ISO DIS 10646 to use C0 and C1 space for            
     graphic characters (see Section 1), as long as:                            
                                                                                
         1.  it specifies that all current C0 (hex 00..1F) and C1 (hex          
         80..9F) control characters from existing standards and                 
         implementations can be represented in the new two and four             
         octet form with leading hex 00 or hex 000000, respectively,            
         and                                                                    
                                                                                
         2.  it specifies announcers which indicate the following data          
         is to be interpreted as two-octet or four-octet forms, i.e.,           
         differently than current systems and standards (see Section            
         7), and                                                                
                                                                                
         3.  it specifies the additional capabilities used with two             
         octet form: using BMP, using non-spacing accents, using SGCI,          
         two-octet compaction of a specified other basic plane with             
         the BMP.                                                               
                                                                                
         4.  it specifies an announcer for "folded" data for use on             
         some existing communications systems that avoids using C0,             
         DEL, and C1 to represent graphic characters                            
                                                                                
         5.  it specifies an announcer to return to current ISO 2022            
         conforming data of current systems.                                    
                                                                                
     2.  Use HOP xx or 00 HOP xxyy for the announcers.                          
                                                                                
     3.  The appropriate announcer must be required for conforming              
     interchange.  It may be omitted only if there is a higher level            
     protocol that specifies the form or if there is prior agreement            
     on the form.                                                               
                                                                                
     4.  Specify an announcer sequence to return to (current) ISO 2022          
     conformant character sets that include oneand multi-byte                   
     character sets (see Section 7.4.2).                                        
                                                                                
     5.  Specify that four hex 00 octets serve as a synchronization             
     sequence that can be sent when the sender isn't sure what form the         
     communication line is in and/or is not sure where the character            
     boundaries are (see Section 7.4).                                          
                                                                                
     6.  An informative annex to recommend a particular byte stuffing           
     algorithm or point to a suitable standard for that, so that XOFF           
     (DC1 = hex 11) and XON (DC3 = hex 13) can be used for flow control         
     on asynchronous, full-duplex, serial communication lines.  The             
     BISYNCH algorithm is OK (see Section 5.2.4).                               
                                                                                
     7.  A second normative annex to specify how to fold ISO                    
     10646/Unicode data so that its twoand four-octet forms cannot be           
     confused with existing C0 (hex 00..1F), DEL (hex FF), and C1 (hex          
     80..9F) characters (see Section 5.2.5).                                    
                                                                                
     8.  If any escape and control sequences and control strings are            
     used within ISO 10646M, they only need initial control character           
     prepadding and final character post-padding; interior padding is           
     not needed (see Section 1.1).                                              
                                                                                
                                                                                
     9.  Code any ISO 10646M controls that can only be used inside ISO          
     10646M using C0 or C1 code extension code points (SO, SI, SS2,             
     SS3); leave the other C0 and C1 control character code points              
     (with 00 and 000000 leading padding in twoand four-octets,                 
     respectively) for use with ISO 10646M (see section Section 1.2).           
=========================================================================       
Date:         Thu, 1 Aug 1991 15:14:05 PDT                                      
Reply-To:     "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Sender:       "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
From:         "F. Avery Bishop 01-Aug-1991 1512" <bishop@DECWET.ENET.DEC.COM>   
Subject:      DEC position on 10646M                                            
                                                                                
To:10646M mailing list                                                         
Subj:DEC position on 10646M                                                    
                                                                                
Digital supports the work of the 10646M ad hoc group chaired by Ed Hart to form 
a single worldwide character code set jointly based on 10646 and Unicode.       
                                                                                
From the national comments on DIS 10646, it is now clear that ISO needs to      
address the desire of users to have just one universal character set.  In       
addition, the Unicode standard, Unicode member companies, computer users, and   
others will realize the following benefit from having a single worldwide        
character code.                                                                 
                                                                                
- Support by implementors                                                       
                                                                                
    Implementors will be encouraged to adopt the multilingual code              
    when they know there is only one standard rather than two                   
    "competing" standards. Interoperability will be enhanced when               
    companies not now involved in the Unicode consortium support the            
    code in multiple hardware platforms and operating systems.                  
                                                                                
- Penetration to markets where international standard conformance is required.  
                                                                                
    This includes procurement requirements for contracts with                   
    government agencies (including EC) and international                        
    organizations.                                                              
                                                                                
- Much broader review, resulting in an improved Unicode.                        
                                                                                
    The ISO voting procedure will provide more feedback from countries          
    and other information technology groups.  For example, there were           
    significant comments from the USSR and Greece on their                      
    requirements for DIS 10646.  Unicode can be improved to get better          
    acceptance in those areas by meeting the requirements.                      
                                                                                
- Support by other international and national standards.                        
                                                                                
    There are many standards which must be extended to deal with a              
    universal character code, including file structures, ASN.1, all             
    OSI protocols, programming languages, application level standards           
    such as ODA, etc).  These standards can only support other de-jure          
    standards.  On the other hand, they will be strongly encouraged to          
    support the Unicode code structure if it becomes an ISO standard.           
    Without these extensions, the scope of Unicode will be restricted,          
    and users would need to use other character sets for many                   
    applications.                                                               
                                                                                
Digital therefore urges all concerned to cooperate with the 10646M effort to    
create a unified universal character code set that meets the needs of industry  
and international users and is acceptable as an ISO standard.                   
=========================================================================       
Date:         Fri, 2 Aug 1991 13:00:48 PDT                                      
Reply-To:     "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Sender:       "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Comments:     Warning -- original Sender: tag was                               
              Joseph_D._Becker.OSBU_North@XEROX.COM                             
From:         Becker.OSBU_North@XEROX.COM                                       
Subject:      Re: DEC position on 10646M                                        
In-Reply-To:  "%pucc.princeton.edu!10646M%JHUVM:BITNET's message of 1 Aug 91    
              15:14:05 PDT (Thursday)"                                          
                                                                                
I agree with one-worldism in general, and with the force of the DEC position as 
expressed by Avery in particular.  But without meaning to stir up sleeping      
hornets, I feel the need to point out again that the goal "one standard" may    
apply at two different levels:                                                  
                                                                                
    > In the loose sense, "one standard" means one 10646M document that we all  
give our blessing.                                                              
                                                                                
    > In the strict sense, "one standard" means that each logical sequence of   
characters has one only one legal encoding (modulo byte-swapping).              
                                                                                
I am worried that if we are unable to hold down on the "compaction methods"     
which might make 10646M into a compose-your-own-encoding portmanteau, then many 
people who sincerely supported unification will find that the bottom line at    
implementation time and run time is that they will STILL be facing a myriad of  
incompatible representations.                                                   
                                                                                
We had a fruitful discussion of "compaction methods" in San Francisco, coming   
to understand that their intent is to try to provide a form of backward         
compatibility (with systems/data using current encodings), by building this     
compatibility into the language defined by the standard.  After careful         
consideration, I think we discovered that it is more effective to take          
compatibility issues out of the encoding syntax and implement them via explicit 
code-conversion processes.  Certainly doing so removes the undesired side       
effect of ambiguous representation introduced by compaction methods.            
                                                                                
If we really hope to arrive at the goals/benefits listed in the DEC position,   
then I think we have to aim for "one standard encoding" in the narrow sense of  
just one representation ... or, okay, just one 16-bit representation and its    
32-bit extension as specified in 10646U.                                        
                                                                                
Joe                                                                             
=========================================================================       
Date:         Fri, 2 Aug 1991 16:52:55 EDT                                      
Reply-To:     "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Sender:       "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
From:         schein@TOROLAB5.VNET.IBM.COM                                      
Subject:      10646M effort                                                     
                                                                                
To illustrate one of the points in Avery's note I am                            
attaching X/Open policy statement on standards.                                 
                                                                                
Isai                                                                            
---------------------------------------------------------------                 
                                                                                
   +---------------------------+         To   : SSC                             
        Standards Policy .                                                      
   +---------------------------+         From : Andrew Walker:                  
  ::                                                                            
                                         Date : 19th July 1991                  
                                                                                
                                         Cc   :                                 
                                                                                
  Ladies and Gentlemen,                                                         
                                                                                
  You will be please to know that the X/Open Board approved                     
  the Standards Policy on 17th July 1991.  The final                            
  wording, which is below, has one change which was                             
  requested by the Board Technical SubCommittee, which                          
  pointed out that CCITT 'Standards' are called                                 
  'Recommendations'.  The words 'Recommendations approved                       
  by ' have therefore been added to the second paragraph.                       
                                                                                
  I would like to thank all those who have contributed to                       
  the development of this policy.  It is a real achievement                     
  which I hope will in time help considerably to create a                       
  good working relationship between X/Open and the                              
  standards world.                                                              
                                                                                
  I was also asked by the Board Technical SubCommittee to                       
  carry out two actions:                                                        
                                                                                
  1:To prepare a communications plan for the Standards                          
  Policy, for approval by the Marketing Managers.  (I will                      
  also seek the approval of the Standards Steering                              
  Committee)                                                                    
                                                                                
  2:To prepare a set of Questions and Answers to                                
  clarify, for internal use, how the standards policy will                      
  be applied.  (I will seek the approval of these by the                        
  Standards Steering Committee).                                                
                                                                                
  The text of the approved Standards Policy is:                                 
                                                                                
  X/Open Standards Policy                                                       
                                                                                
  1:X/Open shall cooperate with formal standards bodies                         
  :to bring standards-based Open Systems to the market                          
  :in a timely and effective manner.  It shall make its                         
  :work available to standards bodies with such release                         
  :of copyright as is required to permit material to be                         
  :incorporated into formal standards.                                          
                                                                                
  2:Where de jure standards exist, X/Open shall conform                         
  :to them.  Wherever possible, X/Open shall use                                
  :International Standards approved by ISO/IEC or                               
  :Recommendations approved by CCITT.  In their absence                         
  :it may adopt Regional or National standards which                            
  :are likely to become internationally adopted.                                
                                                                                
  3:Where de jure standards are under development,                              
  :X/Open shall ensure that its specifications are                              
  :aligned with them.                                                           
                                                                                
  4:Where the results of X/Open work extend beyond that                         
  :covered by the development of de jure standards,                             
  :X/Open shall, in situations where formal                                     
  :ratification is appropriate, and where resources                             
  :permit, submit its work to the standardization                               
  :process.                                                                     
                                                                                
  5:Where there is no de jure standard, X/Open may use                          
  :de facto standards if they are broadly acceptable in                         
  :the market place.                                                            
                                                                                
  6:X/Open, and its Technical Working Groups, shall                             
  :observe the rules of the standards bodies with which                         
  :they work and shall offer reciprocal liaison as                              
  :required.                                                                    
                                                                                
  --                                                                            
                                                                                
  ----------------------------------------------------------------------------  
  Andrew Walker                                         X/Open Company Limited  
  Standards Manager                                   Apex Plaza, Forbury Road  
  EMail: a.walker@xopen.co.uk                        Reading, England, RG1 1AX  
  Tel: +(44) (0)734 508311                            FAX: +(44) (0)734 500110  
  ----------------------------------------------------------------------------  
=========================================================================       
Date:         Fri, 9 Aug 1991 08:26:38 EDT                                      
Reply-To:     "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Sender:       "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
From:         Edwin Hart <HART@APLVM.BITNET>                                    
Subject:      Publication of the Unicode Book                                   
                                                                                
I purposefully delayed writing this until after the successful completion of    
the CJK-JRG in Japan.                                                           
                                                                                
I am asking the Unicode Consortium to delay publication of the Unicode book.    
I know this is very controversial within the Consortium, but I think you should 
seriously consider delaying it.  I can present several reasons for this:        
                                                                                
1.  It would be an ADDITIONAL gesture of "good faith" on the part of the        
    Consortium.  This would definitely help the merger.  It gives more moderate 
    members of WG2 a better bargaining position with those who will oppose any  
    cooperation with the Consortium.  In short, it will promote the merger.     
                                                                                
2.  It gives the Consortium another item to use to bargain with ISO.  Many in   
    WG2 feel the competition of Unicode to be the first to publish an           
    approved code.  The Consortium does not need to tell WG2 that if the        
    merger discussions fail, the Unicode book will go to press almost           
    immediately.  If you publish the book, (a) you have less "good faith" and   
    (b) WG2 MAY have less incentive to cooperate.  (I believe that the 9        
    negative votes that mention a merger between 10646 and Unicode gives WG2    
    a lot of incentive to reach an accommodation.)  In short, do not put your   
    chips in the center of the table too soon--you might need them later.       
                                                                                
3.  You have delayed publication of the UniHan portion of Unicode.  This means  
    that if you publish the non-Han portion of Unicode now, people will need to 
    buy the UniHan part later.  That is extra expense for the customer and the  
    publisher.  You could keep the cost down by publishing them together.       
                                                                                
4.  Assuming the merger is successful (and this is not certain right now)       
    I expect that the merged 10646-Unicode code will be slightly different from 
    what Unicode looks like now.  Therefore, if you publish Unicode now, it     
    will be different from what the merged 10646 international standard will    
    be.  As a customer, I always hate it when the real thing is subtlely        
    different from the documentation.  Moreover, it is always a pain to         
    insert the update pages OR buy the new book with the correct information.   
    By the way, guess who will be publishing the book with the correct          
    information?  ISO!  In short, you do the developers who will implement      
    Unicode a disservice if the book is just "slightly" different from the      
    10646 standard.  They will need to buy another book, either the ISO 10646   
    or a second edition of the Unicode book to obtain the current information.  
    That also leaves the publisher with a surplus of first edition books that   
    will be obsolete a couple of months after it is published.  When you are    
    ready with your second edition and the publisher has a warehouse full of    
    unsold first edition books, he is not going to be very happy and I am       
    sure the unhappiness will be passed on to the Consortium in the cost of     
    publishing the second edition.                                              
                                                                                
In summary, I am begging you to not only consider delaying publication of the   
Unicode book but to actually delay publication of it.  I believe it is in the   
best interest of obtaining a merger, in the best interest of people who will    
implement Unicode, and in the best interest of the Unicode Consortium.  You     
cannot wait until the end of the WG2 meeting.  I would suggest that at the      
WG2 meeting, Mark Davis is prepared to tell WG2 that with satisfactory          
progress of the merger, the Unicode Consortium will delay publication of its    
book.                                                                           
                                                                                
Best regards,                                                                   
Ed                                                                              
=========================================================================       
Date:         Fri, 9 Aug 1991 10:28:49 PDT                                      
Reply-To:     "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
Sender:       "10646M: Multibyte code working group" <10646M@JHUVM.BITNET>      
From:         "K. Yoshimura" <BL.KSS@RLG.BITNET>                                
Subject:      Possibility of a TC 46 Representative Attending August WG 2       
              Meeting                                                           
                                                                                
In early July I noted that ISO TC 46 delegates who voiced interest in           
attending WG2 meetings to help forge closer cooperation said that they          
couldn't attend the August WG 2 meeting since it coincides with IFLA.  This     
morning I received a call from Berlin: Axel Ermert of the National Library      
said he would try to go to at least part of the meeting.  I'm faxing him the    
WG 2 meeting announcement; he said he'd know for sure next week.  (He already   
has Mike Ksar's contact information.)  I hope you see him there.                
                                                                                
Karen                                                                           
                                                                                
To:  10646M@JHUVM.BITNET                                                        
cc:  BB.WED, SALLY, NISONBS                                                     
