17 September 2003
To: Dave Michael, Chairman of the INCITS Standards
Policy Board
CC: Jennifer
Garner, Associate Director, Standards Programs, INCITS
Reference: Letter from Elaine Keown
to ANSI
Dear Dave,
Thank you for forwarding Elaine Keown’s
letter of 5 August 2003.
Ms. Keown states two major
concerns: she is concerned about the procedure
by which characters are encoded in ISO/IEC 10646, and she is concerned about
the appropriateness of stakeholders involved in the encoding process. I’d like to clarify a few points that Ms. Keown may not be aware of.
Hopefully, this will address both concerns to everyone’s
satisfaction.
1. General procedure.
INCITS/L2 (and the Unicode Technical Committee or UTC) strives
to have an open yet rigorous procedure for character encoding. It is our goal to serve the various
linguistic and cultural communities with an appropriate character repertoire in
ISO/IEC 10646; however, there is a process by which these repertoires are
developed, both at the national level (L2) and at the international level
(SC2/WG2). All are welcome to contribute, provided they
follow these procedures.
This well-documented process for encoding characters is
available via the Unicode website (http://www.unicode.org/pending/proposals.html)
and the Principles and Procedures document available on the SC2/WG2 website (http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/principles.html). This
process is in place to ensure technical and linguistic continuity with the rest
of the standard, and has been documented after years of experience working with
proposals from numerous communities.
To date, neither SC2/WG2 nor L2 has received an encoding
proposal or contribution from Ms. Keown. She did communicate with Arnold Winkler,
former L2 chair, participated occasionally on the Unicode mail list, and
presented a paper on Hebrew at the International Unicode Conference in
2. Specific
procedural issues with the Hebrew block.
Ms. Keown expressed several
concerns about the construction and content of the Hebrew character block.
While developing the Hebrew repertoire, SC2/WG2 received
contributions from Hebrew academicians and linguists. The initial Hebrew block was based on
ISO/IEC 8859-8, and other characters have been added since then, following the
character encoding procedure.
With regards to her other specific statements on Hebrew:
a.
Coptic was moved. As Ms. Keown
rightly comments, L2/UTC and SC2/WG2 policy is against moving characters once
they are encoded (see http://www.unicode.org/standard/stability_policy.html).
She then states that the Coptic block
was moved. However, for Coptic, no
characters were moved. Rather, 58
characters were added for Coptic at positions 2C80-2CBF (reference document WG2
N2611).
b.
The Hebrew repertoire is
not contiguous. This is not unique
to Hebrew. The repertoires for Latin, Cyrillic
and Khmer, for example, are broken into several non-contiguous blocks. The ideographs needed for Chinese, Japanese
and Korean are also spread across multiple planes. Placement of the character repertoire in the
standard however has no impact on software implementation. Future character additions will be allocated
as appropriately as possible; however, there is no guarantee in the standard
that characters of a particular writing system will be co-located.
c.
Collation is broken by the
repertoire. Unicode and ISO/IEC
10646 are encoding standards, not collation standards. The location of the characters in the
repertoire does not determine or impact collation order for Hebrew or any other
language/writing system—sorting is determined by the implementation. There are
related standards which collate the repertoire of Unicode and 10646, however, they are not part of the encoding
standard. Ms. Keown
should review the Unicode Collation Algorithm (Unicode Standard Annex #10, http://www.unicode.org/reports/tr10/)
and ISO/IEC 14651 (International String Ordering) for more information.
d.
Hebrew subsets are poorly
grouped. The current subsets in
10646 were developed based on input from user communities. There is a process by which new subsets can
be defined. Again, SC2/WG2 has yet to
receive a formal proposal from Ms. Keown, and welcomes
any contributions concerning Hebrew subsetting.
e.
Only 3 Hebrew script
languages are partially covered. As
noted earlier, there is a formal process for encoding characters. If Ms. Keown has
knowledge of additional scripts needed for encoding Hebrew, we welcome her
contributions.
f.
The block is missing
symbols needed for
g.
Some symbols are conflated
and need semantic differentiation. We
have yet to receive any formal proposals on the need to differentiate these
symbols from Ms. Keown; again, a proposal which
follows the submission guidelines is welcome.
Ms. Keown
raised a concern about the decision makers in the character encoding process at
the national level. She may find the
following interesting:
I hope that it is clear from the above that INCITS/L2
engages in a character encoding process that is open to all interested stakeholders. In addition, this process is rigorous enough
to meet the linguistic and cultural criteria of a community and provide an
interoperable, international standard that works for global software.
Please feel free to contact me should there be any questions
or comments.
With best regards
Cathy Wissink
Chair, INCITS/L2 (Character Sets and Internationalization)