L2 2003-2004 Annual Report

L2/04-375

INCITS / L2 ANNUAL REPORT

Covering the Period from June 2003 through August 2004
Title of INCITS Subgroup L2: Character Sets and Internationalization

2004-08-16

L2 Website (Password Protected)

Informal description of work

Executive Summary

Link to L2 Projects List on INCITS Website

Significant accomplishments

Significant challenges

Expected challenges

Previous year’s meetings

Next year’s meetings

Liaison activities

Membership and Officers

Future trends

Other administrative information

Informal Description of Work.

L2 is the US-TAG for character sets (JTC1/SC2) and for internationalization (JTC1/SC22/WG20). Both subject areas are essential for the development of well-globalized, internationally usable systems and applications. They are particularly important in the rapidly growing marketplace of international WWW access and electronic commerce.

Executive Summary.

The number of L2 members is presently 11. The continued interest in L2 and stability of the membership stems from the following:

· The number of Unicode/ISO 10646 based products continues to grow, both in number and in diversity (more companies are implementing Unicode/ISO 10646);

· Support for Unicode/ISO 10646 on the World Wide Web continues to increase in areas such as XML;

· Unicode/ISO 10646 is supported in additional programming and scripting languages such as C#, and better recognized in the C standard;

· L2 is the TAG for SC22/WG20 (Internationalization), which more companies recognize as important because internationalization has become a mission critical feature of their products.

L2 works very closely with the Unicode® Consortium - all technical meetings are co-located.

L2 meets 4 times per year. Co-location with the Unicode Technical Committee meetings is economical, since most of the members of L2 are also members of the Unicode Consortium and the subject matters overlap widely.

L2 has made the complete switch to electronic document distribution via our web site (password protected). All documents are maintained and archived electronically by default; having good internet access for meetings is now a requirement. Paper documentation is made available for those participants who prefer to work with paper, but it is not the default format. For those writing systems where there currently is no technical solution available (that is, the script is not yet in ISO 10646 or legacy character sets, it is not used electronically and there is no means to input, render or display the text), scanned copies of these documents are used.

For SC2/WG2, the U.S. TAG is INCITS/L2:

Amendment 2 to ISO/IEC 10646-1: 2000 and Amendment 1 to ISO/IEC 10646-2 were published last December as a merged standard: ISO/IEC 10646:2003.

Both of these amendments added additional scripts to the repertoire of ISO 10646 in the Basic Multilingual Plane and the Supplemental Planes.

The work on ISO/IEC 10646:2003 continues on schedule. Two new amendments were initiated this year: the first contains scripts like Glagolitic, Coptic and Georgian; the second contains scripts like N’Ko, Phags-Pa, Phoenician and Cuneiform.

The US is contributing substantially to these projects, as the editor is Michel Suignard from Microsoft in Redmond, who is also the IR of L2.

For SC2/WG3, the U.S. TAG was (until recently) also INCITS/L2:

WG3 (7- and 8-bit codes and their extension) was disbanded at the last SC2 plenary meeting in June 2004, after completing all projects (ISO/IEC 8859-7 and ISO/IEC 2375 were the last two). Any necessary work needed on the existing WG3 projects will take place at the SC2 level.

For SC22/WG20, the U.S. TAG is INCITS/L2:

This last year was a crucial one in terms of internationalization standards. Recent years have shown a disconnect between the internationalization work underway in the Unicode Consortium (and by extension, L2) and the internationalization projects in WG20. One of L2’s major goals was to help bridge this gap by better aligning internationalization standards in the industry.

The most crucial internationalization standard in SC22/WG20, ISO/IEC 14651 (International String Ordering), was moved to an editing group in SC2 as of the last SC2 plenary. This is highly advantageous for the internationalization community, as it allows for much better synchronization of work between the identification of an international character repertoire and the ordering of that character repertoire. Furthermore, the same experts who work on character encoding are by and large the same experts who work on encoding. (Ideally, 14651 would have been moved to SC2/WG2, but that proved to be controversial for a number of national bodies.)

1. ISO/IEC 14651 International String Ordering. The open project for Amendment #2 to the string ordering standard continues. The scope of that amendment is updating the tailorable template table from its current repertoire (covering up to Unicode 3.1) in order to cover the repertoire of Unicode 4.0 (which was released last year). L2 participants bear the prime responsibility for drafting that updated table, which constitutes the majority of the 14651 standard. Work is underway now, and is very tightly coupled to the update of the synchronized standard from the Unicode Consortium: UTS #10, Unicode Collation Algorithm. Long term, the goal is to better synchronize the character repertoire in 10646 with the collation data in 14651.

2. ISO/IEC 15897 Registration of Cultural Elements. As a result of moving ISO/IEC 14651 to SC2, there is only one active project in SC22/WG20: ISO/IEC 15897 (Registration procedures for cultural elements). The work on this has moved forward very slowly, with the final disposition of comments for FCD3 and the final text of the FDIS taking well over a half year. (L2 provided the bulk of comments to the FCD, and it has continued to be a painful process to get the comments appropriately handled by the project editor.) With 15897 being the only work item, the next meeting of WG20 could very well be a conference call meeting rather than face-to-face.

Significant Accomplishments.

With the work winding down in SC22/WG20, there are fewer major (though no less significant) accomplishments in the last year:

· The two FDAMs for ISO/IEC 10646:2000 were approved, merged into ISO/IEC 10646-1:2003, approved and published.

· Work was initiated on Amendments 1 and 2 of ISO/IEC 10646:2003.

· ISO/IEC 14651 was moved from SC22/WG20 to SC2. As noted above, this will improve the synchronization with the work on the Universal Character Set ISO/IEC 10646 and the Unicode Collation Algorithm significantly.

Significant Challenges.

TAG for SC2:

· Passage of the two FDAMs in ISO 10646:2003

· Maintaining synchronization with the Unicode Standard, which is widely implemented, is crucial. This is achieved through co-location of technical meetings and strong liaison activities. ISO/IEC 10646:2003, Amendment 1 will be in sync with the next edition of the Unicode Standard (4.1) which is expected to be published in Spring 2005. Amendment 2 should synchronize with Unicode 5.0.

· Maintaining synchronization between ISO/IEC 14651 and the Unicode Collation Algorithm will also be crucial moving forward.

· Additionally, fending off new proposals for new character sets (especially from emerging markets and East Asia) presents a constant challenge.

Co-operation with C and C++ committees:

· Continue to work in conjunction with SC22. L2 was particularly pleased to see the work done in SC22/WG14 on UTF-16 and UTF-32 data types. This work had been initiated a few years back through an ad-hoc meeting organized by L2 and the Unicode Consortium.

TAG for SC22/WG20:

· The quest for synchronization of the Unicode Collation Algorithm with ISO/IEC 14651 — International string ordering –will hopefully be easier with the new project alignment.

Expected Challenges.

TAG for SC2:

· Synchronize Amendment 1 with Unicode 4.1 and Amendment 2 with Unicode 5.0.

· Prioritize the encoding of new scripts according to market demands and technical readiness, including fonts.

· Political challenges:

o Various state governments in India continuing to request the change of the script model and reordering of characters. In addition, there is significant confusion in the Indic script community concerning the character-glyph model and its relationship to rendering engines.

· Unreasonable requests for pre-composed characters, especially in the Indic scripts (vis-à-vis the matra model in 10646).

· Requirements from East Asia, e.g., compliance with an ever-increasing set of new character sets which contain characters not yet included in 10646 (e.g., new versions of HKSCS, Big-5).

TAG for SC22/WG20:

· Ensure high quality registration process definition in ISO/IEC 15897, and close out the project.

Previous year’s meetings.

# 193	August 25-28, 2003	Pleasanton (Peoplesoft)
# 194	November 4-7, 2003	Baltimore (Johns Hopkins University)
# 195	February 2-5, 2004	Mountain View (Microsoft)
# 196	June 15-18, 2004	Toronto (IBM)
#197	August 10-13, 2004	Redmond (Microsoft)

Next year’s meetings.

#198	November 15-18, 2004	Cupertino (Apple)
#199	February 7-10, 2005	Mountain View (Microsoft)
#200	May 10-13, 2005	Pleasanton (Peoplesoft; tentative)
#201	August 9-12, 2005	San Jose (Adobe; tentative)

Liaison activities

Liaison Representatives to L2:

	Committee	Representative
FCC	Federal Communications Commission	D. Draper-Campbell
NISO (Z39)	National Information Standards Organization	S. McCallum
INCITS	INCITS Standards Development Board	A. Winkler
TC46/SC4/WG1		R. Barry
J4	COBOL	A. Bennett
SC22/WG4	COBOL	A. Bennett

Liaison Representatives from L2:

	Committee	Representative
JTC1/SC2	Character Sets and Information Coding	M. Suignard
SC2/WG2	Universal Coded Character Set	M. Suignard
SC2/WG3	7-bit and 8-bit Codes	M. Suignard (N/A as of June 2004 since WG3 has been disbanded)
NISO (Z39)	National Information Standards Organization	J. Aliprand
WG2/IRG	Ideographic Rapporteur Group	J. Jenkins

Significant activities with liaisons:

· WG2/IRG: L2 continues to work with the IRG to develop the most workable set of CJK (Chinese-Japanese-Korean) characters for ISO/IEC 10646. The liaison work includes locating quality fonts for the standard, proposing new characters as needed and reviewing requests from the IRG. L2’s John Jenkins continues to serve as the US Chief Editor for the IRG. Recent work in the IRG includes identification of a minimal CJK subset (for use on smaller devices that would not have the capacity for 60,000+ ideographic characters).

Membership and Officers.

a. Officers.

Present Officers:

Position	Name	Organization	Training Date
Chair	Cathy Wissink	Microsoft	1/29/02
Vice Chair	Lisa Moore	IBM	7/17/00
International Representative	Michel Suignard	Microsoft	7/15/03
Vocabulary	Open

b. Membership.

Please see the appendix at the end of the document. (To be updated.)

Future trends.

Membership: L2 presently has 11 members: Adobe; Apple Computer, Inc.; Hewlett-Packard Company; IBM Corporation; Microsoft Corporation; Oracle Corporation; PeopleSoft; The Research Library Group, Inc.; Sun Microsystems, Sybase, Inc.; Unicode Inc.

Changes since last year: Adobe joined L2. We are very pleased to have reversed the downward trend of previous years!

The membership is more or less stable, due in part to the members’ continued success in justifying participation to their companies. The economy continues to have a significant dampening effect on standards participation.

Market relevance of standards area: The market relevance for this area of standardization (character sets and internationalization) continues to be great. Most major software companies make a significant portion of their profits from outside of the US with globalized software (e.g., Microsoft makes over 50% of their profit outside the US), and both the Universal Character Set and internationalization play a big role in this.

As international interoperability becomes increasingly important, so does the Universal Character Set. ISO 10646 is used increasingly in Java, C#, in XML, on the web (e.g., the W3C’s work in the Character Model), and in other Internet standards (e.g., the IETF’s Internationalized Domain Name and Internationalized Resource Identifier work), and is considered the logical character set for world wide use. Many programming languages (C, C++, C#, SQL, etc.), enable the use of ISO 10646. New data types for Unicode in programming languages (UTF-16 and UTF-32) were approved earlier this year (see e.g. TR 19769)

Emerging markets (e.g., SE Asia, India, Africa) continue to recognize the importance of communication world-wide. As these markets move towards greater technical capabilities, the standards, expertise and world-wide reach in L2 is becoming even more relevant.

Other administrative information.

None; L2 does not collect or disburse funds.

Regards

Cathy Wissink, L2 chair