Title: Comments on SC22 N1968, "Unique identifiers for character names in ISO/IEC 10646" Source: Edwin Hart (USA) Status: Personal Contribution Action: For the consideration of WG 2 Distribution: ISO/IEC JTC 1/SC 2/WG 2 To make it easier for WG 2 to make a decision on the issue of a unique character-identifier, this document merges the requirements from SC22 N1968 and a Canadian proposal for an NP. Although these two documents were developed independently, most of the requirements are common to both documents. In addition, based on a note from Michael Everson, I tried to clarify the requirements. Merged Requirements (from SC22 N1968 and Canadian NP) The requirements for a short, unique character-identifier are as follows. (The information in square brackets after each requirement refers to the source and the requirement number in the source (SC22 N1968 or the Canadian NP document). 1. Each of the characters encoded in ISO/IEC 10646 shall have a unique identifier. [SC22 1] 2. The unique identifier shall use ISO/IEC 10646 as a basis for assignment. (Identifiers must refer to the UCS as an anchor point.) [Canada NP 4] 3. The unique identifier shall not change with different editions of the ISO/IEC 10646 standard. (Identifiers shall be stable or defined unambiguously even if characters are eventually reallocated or obsoleted within the UCS. (The UCS version may have to be indicated in the identifier.)) [SC22 2, Canada NP 5] 4. The unique identifier shall be short/as short as possible. (Long character names do not meet the SC 22 need.) [SC22 3, Canada NP 1] 5. The unique identifier shall be culturally neutral. (It shall be independent of the language in which ISO/IEC 10646 is written. No one should be forced to buy two different language versions of the ISO/IEC 10646 standard to be able to conform to the standard.) [SC22 4, Canada NP 2 and 8] 6. ISO/IEC 10646 shall allow alternate names for characters (for example, French names). [SC22 5] 7. ISO/IEC 10646 should allow alternate identification schemes for characters provided the unique identifier is the cross- reference to SC 2 standards (for example, Michael Everson's or Keld Simonsen's). [Canada NP 7] 8. The unique identifier should be of fixed length to be easily processed. [SC22 6, Canada NP 3] 9. So that coded-character-sets other than ISO/IEC 10646 may be used, the identifier should be independent of the coded-character- set in use. [SC22 7] 10. With the unique identifier, it shall be possible to specify ranges of characters. [Canada NP 6] Proposed Solution The unique identifier shall be the UCS-4 code position of the character in ISO/IEC 10646. When displaying the unique identifier, the identifier shall be displayed as an 8-character hexadecimal (base 16) number using the digits "0" through "9", and the letters "A" through "F" (or "a" through "f"). None of the UCS Transformation Formats (UTFs) shall be used as the basis of the unique identifier. The representation of the unique identifier for processing, storage, interchange, etc., are not specified. Major Issue A major issue is: if the 10646 code position becomes the unique identifier, does the unique identifier change when WG2 decides to move characters? Adopting the proposal implies that characters in the standard should not be moved except in extreme circumstances because this will change the code position and the proposed unique identifier. Should characters be moved, I think that the identifier must reflect the new code position. Should, SC 2/WG 2 decide to move the code positions of some limited number of characters in subsequent editions of the standard, SC 2/WG 2 would then need to add an annex to document such changes. In addition, the Canadian NP proposal suggests prefixing the hexadecimal identifier with a letter to identify the version of 10646.