From: Dean Snyder (dean.snyder@jhu.edu)
Date: Tue Jan 13 2004 - 16:23:32 EST
Two basic models for encoding cuneiform have been discussed - dynamic and
static.
* The dynamic model would encode approximately 300 base (or "simple",
or "primitive") cuneiform characters along with 14 character modifiers in
a system that would allow cuneiformists to dynamically create "all"
cuneiform signs.
* The static model would hard code "all" of the approximately 1000
cuneiform signs, to include base signs, base signs that have been
modified, and base signs that have other base signs embedded within them.
The differences between the two systems are roughly analogous to the
differences between encoding the character "A" and the character "ACUTE
ACCENT" as separate code points versus encoding "A WITH ACUTE ACCENT" as
a single code point. The dynamic model is more elegant and extensible,
but more complex; the static model is more brute force and fixed, yet simpler.
Cuneiform as a script system was dynamic in its early periods; the
scribes productively introduced new signs with new meanings by applying
various standardized modifications to base signs. For one example, see
the graphic of the LU2, "human being", sign with several of its
modifications attached to this eamil. For several more representative
examples, see <http://www.jhu.edu/ice/basesigns/>, and for a "complete"
repertoire of the base and modified signs see the 1.3 mb PDF file at
<http://www.jhu.edu/ice/basesigns/baseAndModifiedCuneiformSigns.pdf>.
(All images are from screen shots of Steve Tinney's Classic Cuneiform font.)
Recently I proposed we re-think the decision made at the Initiative for
Cuneiform Encoding conferences to statically encode cuneiform. The
reaction has been mixed, but I consider only 2 of the objections as
material. (I have appended to this email excerpts from the various
reactions along with some of my responses.):
(OBJECTION) The dynamic model is too fragile; unencoded glyphs will be
"hidden" in the Private Use Area or in OpenType glyph tables.
(RESPONSE) Not being a font designer, I called a font designer friend
of mine and he DID say there are tool problems and operating system
problems associated with non-code-point-specified glyphs in OpenType. He
specifically mentioned Volt and FontLab. For what it's worth, I have seen
a difference between Jaguar and Panther in how Mac OS X treats characters
in the PUA - in Panther they commonly show up with the indeterminate
glyph symbol even when a suitable font, that worked in Jaguar, is installed.
(OBJECTION) The dynamic model is too complex; it will require a
specified syntax.
(RESPONSE) Yes, we will need to specify a syntax and associated
properties for the modifier characters, namely what the permissible
character sequences are and how the modifier characters react with the
base characters and with one another.
I have identified 14 cuneiform modifiers as candidates for encoding and I
divide them into 3 major sub-groups:
DECORATORS
Gunu - parallel, small wedges added to a base sign
Sheshig - parallel "winkelhakens" added to a base sign
Nutillu - wedges deleted from a base sign
Curved - curvature added to the base sign (used only for numbers)
ORIENTERS
Tenu - slant a sign 45 degrees clockwise
Inverse - flip a sign 180 degrees verticslly
Reverse - flip a sign 180 degrees horizontally
POSITIONERS
Infix - embed one sign in another
Affix - place one sign after an infixed sign
Cross - cross two of the same signs
Oppose - oppose two of the same signs
Square - arrange four of the same signs in a cross
Superpose - place one sign over another
Postfix - place one sign after another (making a compound sign)
I will suggest syntax rules for these modifiers in a subsequent email. In
the meantime, I would appreciate any technical feedback to the issues
presented here. (For instance, as an example of something I haven't
discussed here, how should markup affect our decision?)
Respectfully,
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
www.jhu.edu/digitalhammurabi
---------------------------------
APPENDIX A
SUMMARY OF RESPONSES TO A PROPOSAL TO DYNAMICALLY ENCODE CUNEIFORM
FOR A DYNAMIC MODEL
1 It's more powerful.
2 It fits best with the Unicode model.
3 It mirrors the way the actual script works.
4 It allows for new signs without new encodings.
I agree with all these assessments.
AGAINST A DYNAMIC MODEL
(My responses follow each in parenthesis)
1 It's too much work to do a dynamic model proposal now. (Actually,
it's not. We already have all the modifiers identified and all the base
characters both identified and designed into two existing fonts. We only
need to specify the syntax and properties for the modifiers.)
2 It's too late to change to a dynamic model. (No, it's not. See # 1 above.)
3 There won't be that many newly discovered signs. (That's an educated
guess, not a fact.)
4 One model is as good as the other, so let's just stick with what we
have. (One model is NOT as good as the other, not if one accepts the
Unicode model.)
5 Complex sign shapes can be context-bound and the dynamic model makes
this more difficult to capture. (This is a font issue.)
6 Dynamic glyphs are much more difficult to render - it was rejected
for CJK. (Then perhaps we should throw out Unicode IPA and Devanagari and
Hebrew, ... and encode all the possible character combinations as
separate code points? For example, how is CUNEIFORM SIGN LU2 + CUNEIFORM
POSITIONER INFIX + CUNEIFORM SIGN ESH2 + CUNEIFORM DECORATOR TENU any
more complex to render than say HEBREW LETTER DALET + HEBREW POINT DAGESH
+ HEBREW POINT QAMATS? Cuneiform glyph formation is nowhere near as
complex as CJK's.)
7 The dynamic model is too fragile; unencoded glyphs will be "hidden"
in the Private Use Area or in OpenType glyph tables. (Now, we are moving
into real objections. Not being a font designer, I called a font designer
friend of mine and he DID say there are tool problems and operating
system problems associated with non-code-point-specified glyphs in OpenType.)
8 Dynamic is too complex - it will require a specified syntax. (Yes, we
will need to specify a syntax and associated properties for the modifier
characters, namely what the permissible character sequences are and how
the modifier characters react with the base characters and with one another.)
-----------------------------
APPENDIX B
EXCERPTS FROM RESPONSES TO A PROPOSAL TO DYNAMICALLY ENCODE CUNEIFORM
ARRANGED CHRONOLOGICALLY
[I know it is always risky to pull quotes out but I have attempted to
fairly represent the authors' intentions. Full responses can, of course,
be supplied.]
Patrick Andries: "I believe it is a more powerful system (an open one)
but it will depend on those fonts and keyboards being developped."
Rick McGowan: "Bringing up a fundamental model issue like this again at
this stage (6 weeks after the current proposal was presented to UTC)
could potentially derail the cuneiform encoding process indefinitely."
Michael Everson (an author of the static proposal): "Out of the question.
We have accepted a different model."
Christopher Fynn: "This fits in best with the Unicode character encoding
model and is definitely the way to go, particularly if the script was
productive. ... I think it is always a good idea to closely mirror in
encoding the way a script system actually works - and break it down into
primitives or base characters, combining marks and modifiers."
William Overington: "The Unicode encoding of cuneiform needs, in my
opinion, to be encoded to last. Each displayable glyph needs a formal
Unicode code point, otherwise the glyphs will end up either encoded using
Private Use Area code points all over the place or else being hidden away
in glyph tables within Opentype fonts."
Steve Tinney (an author of the static proposal): "This approach has a lot
to commend it, and I came to ICE1 with this suggestion. There was
substantial discussion of the pros and cons, and I ended up feeling that
encoding the complex signs as characters was as good a way to go as any.
I would not advocate changing that decision."
Lloyd Anderson: "Like several of us (including Feuerherm, Tinney,
recently Dean), I have considered the possibility of encoding containers
x contents. I wavered in favor of it at some points, but not now. ... The
greatest advantage of an encoding as [container x contained]
would be that it accomodates additional signs, and no change to the
default (binary or default sorting tables) would be necessary to
accomodate them. There are *many* such additional signs (contra Tinney..."
Karljuergen Feuerherm (an author of the static proposal): "As ought to be
clear from my many postings on the subject over the last n years, I have
generally favoured encoding at low level and using combinations of some
kind to describe the more complex items, and have advocated this time and
again. To be honest, I have never really felt that the pros and cons were
thoroughly thought through, and this has been a disappointment to me.
However, at this point, I am not at all interested in reopening any past
arguments of any kind. We've got a preliminary proposal in which has
taken a certain direction, and we must, for pragmatic reasons, maintain
it. ... As much as I hate to say it, a mediocre functional encoding is
still better than no encoding at all."
Michael Everson: "Whatever the merits of one possible encoding over
another may be in theory, it should be remembered that one of the reasons
the static-glyph model was preferred over the dynamic-glyph model is that
it is far easier to render. It would be possible to encode Chinese
characters with dynamic fiddly bits which would interact with other base
characters. But it'd be a font nightmare. There's no payoff."
This archive was generated by hypermail 2.1.5 : Tue Jan 13 2004 - 17:09:51 EST