L2/99-382
Comments
to accompany a U.S. NO vote on JTC1 N5999, SC2 N3393, New Work item
proposal (NP) for an amendment of the Korean part of ISO/IEC 10646-1:1993.
December 9, 1999
1. The U.S. strongly objects to this New Work
item proposal (NP), because of its obvious potential to cause a destabilization
of a major, implemented portion of the International Standard 10646. The stated intent of the NP is to completely
reorganize and recode the Korean characters in the standard. That intent runs
directly counter to the stated policy of SC2/WG2 not to move, delete, or rename
any character in the International Standard once standardized—a policy which is
also strongly supported and maintained by the Unicode Technical Committee in
its parallel development of the industry Unicode Standard, in synchronization
with the International Standard, ISO/IEC 10646.
The last
reorganization of Korean in ISO/IEC 10646 was the result of Amendment 5. That
reorganization caused major disruptions to implementations of the standard,
occasioned much controversy at the time, and has had lingering effects on
current implementations. The result of
that amendment (which some in the character standards community still consider
a “fiasco”) was to harden the resolve of all parties involved to never, ever
allow such a reorganization of the standard to occur again. It would be exceedingly
damaging to the standard, and would easily accrue damages extending to the
multi-millions of dollars among the implementing vendors. Accordingly, it is the firm view of the U.S.
committee that the experience of Amendment 5 should not be taken as setting a
precedent for periodic reevaluation and reorganization of the Korean encoding,
but rather as the demonstration case that such reorganization is disruptive and
damaging and should never be allowed to occur again in the standard.
Based on
this consideration alone, the U.S. vote could not be turned to a YES by
anything less than a near-complete withdrawal of all of the contents of this NP
and its substitution by a request merely for the encoding of some
repertoire of new characters not already encoded in the standard.
2. Regarding the specific parts of the NP, the
U.S. has the following comments.
2.1. The amendment of the Korean jamo proposed in
Annex A of JTC1 N5999 (“Korean Character Combining Alphabet”) would constitute
a rearrangement (and hence re-encoding) of existing encoded characters. That is clearly unacceptable. The NP
“proposed to add 8 characters in the character subset of Korean combining
alphabets (1100-11FF)...” However, that claim of 8 additional characters is
correct only in terms of the accounting of total assigned code points in the
block.
An
examination of Annex A in some detail shows that more new characters than 8 are
being proposed, and some encoded characters already existing in that block of
ISO/IEC 10646-1 are not accounted for in the new proposal. Just considering the
initial (choseong) part of the table, it appears that Annex A proposes 6 new
characters (at positions 1113, 1114, 1117, 1120, 1128, and 112A), and proposes
to omit two characters already encoded in 10646:
U+1134
and U+1146.
The
only way to salvage this part of the proposal would be to make a new proposal
(accompanied by the WG2 Summary Proposal Form) containing the exact list
of new conjoining jamo characters being proposed, along with their identities
and exemplifications in print. In particular, there would need to be
justification for why such proposed entities as Annex A 1113 ... INITIAL
CONSONANT HYOBADAKSORI-NIUN should be considered a separate character from the
existing U+1102 HANGUL CHOSEONG NIEUN, and so forth.
2.2. The proposal in JTC1 N5999 suggests a
rearrangement of all of the 11172 Korean syllables already encoded in 10646, to
follow the jamo order of KPS 9566-97. This suggestion is also completely
unacceptable; it would destabilize existing implementations. The suggestion to negotiate some kind of
“third proposal” would make no difference—it would still represent a destabilization
of already standardized characters.
IS
10646 has never been under any obligation to strictly follow the ordering of
characters in any particular national standard. Easily available transcoding technology exists for converting
data expressed in IS 10646 for interchange purposes, into any particular
national standard, including KPS 9566-97. The U.S. Committee would simply invite the Standardization Committee of
the DPRK to publish, in machine-readable form, its transcoding table between
the existing IS 10646 and KPS 9566-97. That would facilitate conversion and
eliminate the “confusion and difficulty in the information interchange.”
For
ordering purposes, alternate technologies also exist. The appropriate way to
adapt data representing in the UCS to local requirements for ordering is either
to A) transcode the data into the local character encoding and sort
conventionally using that local encoding, or B) make use of the collation key
generation mechanisms described in the Unicode Collation Algorithm and in FCD3
14651 International String Ordering.
2.3 The proposal requests an additional 80
symbolic characters be encoded in 10646. This set is claimed to consist of
characters in KPS 9566-97 but not in ISO/IEC 10646-1.
The
actual set is a very mixed collection, and includes some characters that are
already encoded (U+2601, U+2607, fractions) with slightly different glyphs,
some characters included in other proposals under consideration by WG2 (arrows,
circled numbers), some “characters” that would probably not meet the criteria
of WG2’s character/glyph model for appropriateness as encoded characters
(apostrophe off centre), emphasized Korean Hangul syllables intended to spell
out the particular persons’ names Kim Il Sung and Kim Jong Il (“Kim” encoded
twice, and “Il” encoded twice), and yet another representation of the Korean
jamo alphabet (Korean Compatibility Alphabet XYZ) with no particular
justification presented.
While
a number of these 80 symbols may in fact be acceptable characters for inclusion
in 10646, the appropriate mechanism to use for their consideration would, once
again, be submission of the repertoire, along with a WG2 Summary Proposal Form,
with detailed explication and justification for the proposed characters,
including citations in printed documents. Based on that information, other
National Bodies could then provide feedback and commentary regarding which, if
any, of these characters meet the technical criteria for inclusion in 10646.
For this kind of small repertoire, no NP is really necessary.
2.4. The proposal in JTC1 N5999 proposes adding the
encoding representations of each of the ideographs in KPS 9566-97 to the
published code table of CJK Unified Ideographs in 10646. The U.S. considers
this unnecessary. It is clearly not feasible for the publication of the second
edition of ISO/IEC 10646-1, which is nearly complete. But even in the future,
the main function of printing the encodings for various CJK sources in the
standard itself is to establish the normative identity of characters by their
values in the official source sets for the unification. Among other things,
this clarifies the instances where the source set separation rule was invoked,
thereby preventing the unification of a character that would otherwise have
been unified on shape and semantic criteria.
For
all other purposes, the transcoding tables between 10646 and various Asian
national character encoding standards are best maintained as separate,
machine-readable files—not printed in the standard itself. KPS 9566-97 is just
one of many other national and vendor Asian character encodings whose values
are not printed in the 10646 CJK Unified Ideographs code tables. Its inclusion
is not necessary, and would in fact detract from the usefulness of the
normative encodings from the source sets which are currently printed there.
2.5. The proposal in JTC1 N5999 suggests the
renaming of all the Korean characters in 10646, to replace the term “HANGUL”
with the term “KOREAN CHARACTER”. The U.S. firmly opposes the changing of any
character names in the standard. The names are a normative part of the
standard, referred to in many implementations. Changing them would cause major
disruptions. The concern expressed in the proposal about misunderstanding of
the term “Hangul” seems misplaced to us. The U.S. sees no indication that
implementors are confused about the use of the term “HANGUL” in the normative
names as referring to Korean characters. In any case, if there is any
misunderstanding about terminology or confusion with competing terms such as
“Hanmal”, “Hansik”, or “Hanminjok”, etc., this concern could be addressed by a
minor editorial note in 10646 explaining the intent of the usage of “HANGUL”,
rather than by a wholesale replacement of normative character names.
In
addition, the US has approved Resolution M37.12 (Feedback to D.P.R of Korea) at the
last meeting of JTC1 SC2/WG2 in Copenhagen.
We attach the text of this resolution, which the SC2 secretariat has
taken on to communicate to D.P.R.K.
RESOLUTION M37.12 (Feedback to D.P.R of Korea): Unanimous
With reference to the NP in
document N2056 to amend the Korean encoding of Amendment 5, WG2 instructs its
convener to inform S2 to respond to the Committee for Standardization of the
D.P.R. of Korea:
a)
that WG2 cannot support this NP because any reordering of the standardized
Korean Hangul characters would harm existing implementations that are using the
standard including its Amendment 5
b)
that existing standardized character names cannot be changed because character
names are normative in the standard and changing them would harm existing users
of these standardized character names
c)
invite them to make concrete proposals to add any missing characters following
the existing WG2 Procedures and Guidelines document (JTC1/SC2/WG2 N2002), and
the conventions for naming of characters in the standard, for future
consideration by WG2,
d)
invite them to participate in the IRG regarding any Hanja character
requirements they may have, and
e)
draw their attention to FCD-3 of ISO/IEC 14651 --international ordering under
ballot in SC22.