Re: Origins of ẘ from Ken Whistler on 2012-04-16 (Unicode Mail List Archive)

From: Ken Whistler <kenw_at_sybase.com>
Date: Mon, 16 Apr 2012 12:05:17 -0700

On 4/15/2012 10:04 PM, Asmus Freytag wrote:
> The 1E00 and 1F00 blocks were populated, in Unicode 1.1 by rejects
> from Unicode 1.0 that were re-admitted as part of the merger with
> ISO/IEC 10646. If you have anyone with access to the early (paper
> only) meeting documents of WG2, you might, just might, find a source
> for them.

Well, guess what -- I have access to someone with the relevant meeting
documents. ;-)

The first key document is:

WG2 N754, Review of repertoire, by Masami Hasegawa, dated September 1991.
(Mark Davis and I assisted Hasegawa-san in pulling together the lists in
this
document.)

That document lists *all* of the Latin composite letter collections that
Hasegawa-san,
then the editor of 10646, had to wrestle with, in order to come up with
an acceptable
draft for the 2nd DIS, after the failure of the first DIS vote and the
determination by
WG2 that a merger of repertoires was necessary to construct a DIS that
could pass.
(A lot of other architectural changes were necessary as well, but right
now I'm
focussing on the Latin repertoire issue.)

Section 1.1.2 of WG2 N754 reads:

===============================================================

1.1.2 Latin Composites, Collection #2A

Extra Latin composites, descending from DP1 of 10646. These are derived
from a
variety of sources, and are intended to cover a number of languages and
transcriptional systems (e.g. various Indo-European and Semitic
transcriptions).

===============================================================

There then follows a long list of composite characters that were in DP1
of 10646.
WG2 N754 then goes on to identify which of those particular characters were
supported by explicit national requirements in the ballot record. The
remainder
were winnowed down, using a list of exceptions, explicitly spelled out
on page 8
of WG2 N754. What was left constituted the bulk of the composite Latin
characters that were eventually included in the 2nd DIS in the range
1E00..1E95,
and which you see there still in the standard.

O.k., so far so good. But you may well ask, what about 1E96..1E9A, which
includes
the ẘ character? How did *those* get in?

Well, the pertinent document for that is WG2 N759, "Liaison Statements
to JTC1/SC2/WG2
considering the Arabic part of ISO DIS 10646M", from the ECMA (European
Computer
Manufacturers Association) Arabic Task Group, dated October 1991. The
relevant
portion of that document is Appendix A (=ECMA ATG N213), "Tranliteration
[sic] characters
for Arabic characters and Hieroglyphs", authored by Alaa Ghoneim, who at
the time
was representing Egypt during the WG2 meetings. Alaa Ghoneim cites as
sources
ISO 233 Parts 1 and 2 (for Arabic) and the Egyptian Grammar by Gardiner for
Latin transliteration of hieroglyphs.

Part II of that Appendix says: "The following [9] characters do not
exist in 10646
and hence need to be added in plane 0", followed by 9 composite
transliteration
characters from one of those two sources -- not individually identified.

WG2 N759 was discussed at the Paris meeting of WG2 (October 7-11,
1991). The
minutes from that meeting (WG2 N767) note:

"N759 ECMA Arabic TG Input and N746 Input from Egypt
1) 9 missing characters for transliteration
==> review all transliteration characters
..."

Hasegawa-san took that under advisement and determined that 3 of the
transliteration characters in that list of 9 were in fact already in the
draft of the DIS.
The remaining six are those which you now see in the range U+1E96..U+1E9A,
including the ẘ in question.

No national body objected to the inclusion of those particular 6 in the
voting on 10646 DIS 1.2,
so they ended up published in the eventual 10646-1:1993 (and in Unicode
1.1).

And that, folks, is the origin of ẘ.

--Ken
Received on Mon Apr 16 2012 - 14:09:24 CDT

This archive was generated by hypermail 2.2.0 : Mon Apr 16 2012 - 14:09:25 CDT