INCITS / L2 ANNUAL
REPORT
Covering the Period from June 2003 through August 2004
Title of INCITS Subgroup L2: Character Sets and Internationalization
2004-08-16
L2 Website (Password Protected)
Link to L2 Projects List on INCITS Website
Other administrative information
Informal Description of Work.
L2 is the US-TAG for character
sets (JTC1/SC2) and for internationalization (JTC1/SC22/WG20). Both subject
areas are essential for the development of well-globalized, internationally
usable systems and applications. They are particularly important in the rapidly
growing marketplace of international WWW access and electronic commerce.
The number of L2 members is
presently 11. The continued interest in L2 and stability of the membership
stems from the following:
·
The
number of Unicode/ISO 10646 based products continues to grow, both in number
and in diversity (more companies are implementing Unicode/ISO 10646);
·
Support
for Unicode/ISO 10646 on the World Wide Web continues to increase in areas such
as XML;
·
Unicode/ISO
10646 is supported in additional programming and scripting languages such as C#,
and better recognized in the C standard;
·
L2
is the TAG for SC22/WG20 (Internationalization), which more companies recognize
as important because internationalization has become a mission critical feature
of their products.
L2 works very closely with the
Unicode® Consortium - all technical meetings are co-located.
L2 meets 4 times per year.
Co-location with the Unicode Technical Committee meetings is economical, since
most of the members of L2 are also members of the Unicode Consortium and the
subject matters overlap widely.
L2 has made the complete switch
to electronic document distribution via our
web site (password
protected). All documents are maintained
and archived electronically by default; having good internet access for
meetings is now a requirement. Paper
documentation is made available for those participants who prefer to work with
paper, but it is not the default format. For those writing systems where there currently
is no technical solution available (that is, the script is not yet in ISO 10646
or legacy character sets, it is not used electronically and there is no means
to input, render or display the text), scanned copies of these documents are
used.
For
SC2/WG2, the
Amendment 2 to ISO/IEC
10646-1: 2000 and Amendment 1 to ISO/IEC 10646-2 were
published last December as a merged standard: ISO/IEC 10646:2003.
Both of these amendments added
additional scripts to the repertoire of ISO 10646 in the Basic Multilingual
Plane and the Supplemental Planes.
The work on ISO/IEC 10646:2003
continues on schedule. Two new
amendments were initiated this year: the first contains scripts like Glagolitic, Coptic and Georgian; the second contains
scripts like N’Ko, Phags-Pa,
Phoenician and Cuneiform.
The
For
SC2/WG3, the
WG3 (7- and 8-bit codes and their extension) was disbanded at the last SC2 plenary meeting in June 2004, after completing all projects (ISO/IEC 8859-7 and ISO/IEC 2375 were the last two). Any necessary work needed on the existing WG3 projects will take place at the SC2 level.
For
SC22/WG20, the
This last year was a crucial one in terms
of internationalization standards.
Recent years have shown a disconnect between
the internationalization work underway in the Unicode Consortium (and by
extension, L2) and the internationalization projects in WG20. One of L2’s major goals was to help
bridge this gap by better aligning internationalization standards in the industry.
The most crucial internationalization
standard in SC22/WG20, ISO/IEC 14651 (International String Ordering), was moved
to an editing group in SC2 as of the last SC2 plenary. This is highly advantageous for the
internationalization community, as it allows for much better synchronization of
work between the identification of an international character repertoire and
the ordering of that character repertoire.
Furthermore, the same experts who work on character encoding are by and
large the same experts who work on encoding. (Ideally, 14651 would have been moved to
SC2/WG2, but that proved to be controversial for a number of national bodies.)
1.
ISO/IEC 14651 International String Ordering. The
open project for Amendment #2 to the string ordering standard continues. The
scope of that amendment is updating the tailorable
template table from its current repertoire (covering up to Unicode 3.1) in
order to cover the repertoire of Unicode 4.0 (which was released last year). L2 participants bear the prime responsibility
for drafting that updated table, which constitutes the majority of the 14651
standard. Work is underway now, and is very tightly coupled to the update of
the synchronized standard from the Unicode Consortium: UTS #10, Unicode
Collation Algorithm. Long term, the goal
is to better synchronize the character repertoire in 10646 with the collation
data in 14651.
2.
ISO/IEC 15897 Registration of Cultural Elements. As a result of moving ISO/IEC 14651 to SC2,
there is only one active project in SC22/WG20: ISO/IEC 15897 (Registration
procedures for cultural elements). The
work on this has moved forward very slowly, with the final disposition of
comments for FCD3 and the final text of the FDIS taking well over a half
year. (L2 provided the bulk of comments
to the FCD, and it has continued to be a painful process to get the comments
appropriately handled by the project editor.)
With 15897 being the only work item, the next meeting of WG20 could very
well be a conference call meeting rather than face-to-face.
With the work winding down in
SC22/WG20, there are fewer major (though no less significant) accomplishments
in the last year:
·
The
two FDAMs for ISO/IEC 10646:2000 were approved,
merged into ISO/IEC 10646-1:2003, approved and
published.
·
Work
was initiated on Amendments 1 and 2 of ISO/IEC 10646:2003.
·
ISO/IEC
14651 was moved from SC22/WG20 to SC2.
As noted above, this will improve the synchronization with the work on
the Universal Character Set ISO/IEC 10646 and the Unicode Collation Algorithm significantly.
TAG for SC2:
·
Passage
of the two FDAMs in ISO 10646:2003
·
Maintaining
synchronization with the Unicode Standard, which is widely implemented, is
crucial. This is achieved through co-location of technical meetings and strong
liaison activities. ISO/IEC 10646:2003, Amendment
1 will be in sync with the next edition of the Unicode Standard (4.1) which is
expected to be published in Spring 2005. Amendment 2 should synchronize with Unicode
5.0.
·
Maintaining
synchronization between ISO/IEC 14651 and the Unicode Collation Algorithm will
also be crucial moving forward.
·
Additionally,
fending off new proposals for new character sets (especially from emerging
markets and
Co-operation with C and C++
committees:
·
Continue
to work in conjunction with SC22. L2 was
particularly pleased to see the work done in SC22/WG14 on UTF-16 and UTF-32
data types. This work had been initiated
a few years back through an ad-hoc meeting organized by L2 and the Unicode
Consortium.
TAG for SC22/WG20:
·
The
quest for synchronization of the Unicode Collation Algorithm with ISO/IEC 14651
— International string ordering –will hopefully be easier with the
new project alignment.
TAG for SC2:
·
Synchronize
Amendment 1 with Unicode 4.1 and Amendment 2 with Unicode 5.0.
·
Prioritize
the encoding of new scripts according to market demands and technical
readiness, including fonts.
·
Political
challenges:
o
Various
state governments in
·
Unreasonable
requests for pre-composed characters, especially in the Indic scripts (vis-à-vis
the matra model in 10646).
·
Requirements
from
TAG for SC22/WG20:
·
Ensure
high quality registration process definition in ISO/IEC 15897, and close out
the project.
|
|
|
# 194 |
|
|
# 195 |
|
|
# 196 |
|
|
#197 |
August 10-13, 2004 |
Redmond (Microsoft) |
Next year’s meetings.
#198 |
November 15-18, 2004 |
|
#199 |
February 7-10, 2005 |
|
#200 |
May 10-13, 2005 |
|
#201 |
August 9-12, 2005 |
|
Liaison Representatives to L2:
|
Committee |
Representative |
FCC |
Federal Communications Commission |
D. Draper-Campbell |
NISO (Z39) |
National Information Standards Organization |
S. McCallum |
INCITS |
INCITS Standards Development Board |
A. Winkler |
TC46/SC4/WG1 |
|
R. Barry |
J4 |
COBOL |
A. Bennett |
SC22/WG4 |
COBOL |
A. Bennett |
Liaison Representatives from
L2:
|
Committee |
Representative |
JTC1/SC2 |
Character Sets and Information Coding |
M. Suignard |
SC2/WG2 |
Universal Coded Character Set |
M. Suignard |
SC2/WG3 |
7-bit and 8-bit Codes |
M. Suignard (N/A as of June 2004 since WG3 has been disbanded) |
NISO (Z39) |
National Information Standards Organization |
J. Aliprand |
WG2/IRG |
Ideographic Rapporteur Group |
J. Jenkins |
Significant activities with
liaisons:
·
WG2/IRG:
L2 continues to work with the IRG to develop the most workable set of CJK
(Chinese-Japanese-Korean) characters for ISO/IEC 10646. The liaison work
includes locating quality fonts for the standard, proposing new characters as
needed and reviewing requests from the IRG.
L2’s John Jenkins continues to serve as the
a.
Officers.
Present Officers:
Position |
Name |
Organization |
Training Date |
Chair |
Cathy Wissink |
Microsoft |
|
Vice Chair |
Lisa Moore |
|
7/17/00 |
International Representative |
|
Microsoft |
7/15/03 |
Vocabulary |
Open |
|
|
b. Membership.
Please see the appendix
at the end of the document. (To be
updated.)
Membership: L2 presently has 11 members: Adobe; Apple
Computer, Inc.; Hewlett-Packard Company;
Changes since last year: Adobe joined L2. We are very pleased to have reversed the
downward trend of previous years!
The membership is more or less
stable, due in part to the members’ continued success in justifying
participation to their companies. The
economy continues to have a significant dampening effect on standards
participation.
Market relevance of
standards area: The
market relevance for this area of standardization (character sets and
internationalization) continues to be great. Most major software companies make
a significant portion of their profits from outside of the
As international
interoperability becomes increasingly important, so does the Universal
Character Set. ISO 10646 is used increasingly in Java, C#, in XML, on the web
(e.g., the W3C’s work in the Character Model), and in other Internet
standards (e.g., the IETF’s Internationalized
Domain Name and Internationalized Resource Identifier work), and is considered
the logical character set for world wide use. Many programming languages (C, C++, C#, SQL,
etc.), enable the use of ISO 10646. New
data types for Unicode in programming languages (UTF-16 and UTF-32) were
approved earlier this year (see e.g. TR 19769)
Emerging markets (e.g., SE
Asia,
Other administrative information.
None;
L2 does not collect or disburse funds.
Regards
Cathy Wissink, L2 chair