From: Dean Snyder (dean.snyder@jhu.edu)
Date: Thu May 06 2004 - 12:47:02 CDT
Paul James Cowie wrote at 6:44 AM on Thursday, May 6, 2004:
>Somewhat echoing Deborah Anderson's contribution from a few days ago, I
>am categorically against any script unification in this matter and I
>believe that Phoenician script should be encoded separately from square
>Hebrew script - when I have the need to encode both scripts within one
>XML / XHTML document, I want to be sure that both scripts are rendered
>accurately without confusion, and without having to step though a font
>minefield.
Take this sentence - Phoenician BT 'LM, "house of eternity, grave",
occurs (with matres lectionis) in Biblical Hebrew as BYT 'WLM.
Here are the polar choices for XML:
TAGGED (but not encoded)
Phoenician <Phn>BT 'LM</Phn>, "house of eternity , grave", occurs (with
matres lectionis) in Biblical Hebrew as <Heb>BYT 'WLM</Heb>.
ENCODED (but not tagged)
Phoenician BT 'LM, "house of eternity, grave", occurs (with matres
lectionis) in Biblical Hebrew as byt 'wlm.
[using case to simulate the different encodings]
The tagged version is not a "font minefield". On the contrary, it
explicitly provides an international standard mechanism for a level of
specification and refinement not possible via encoding. You can, for
example, do things like: <Phn subscript="Punic" locus="Malta"
font="Maltese Falcon">BT 'LM</Phn>. In fact, this is precisely the sort
of thing for which XML was designed.
The untagged, but differently encoded version, on the other hand, IS a
search and text processing quagmire, especially when confronted by the
possibility of having to deal with multiplied West Semitic encodings,
e.g., for the various Aramaic "scripts" and Samaritan.
Obviously there is a need, in many cases, to maintain the distinction
between the various diascripts; the question is where should that
distinction be introduced - at the encoding level or higher?
The issue is not whether this particular proposal represents Phoenician
"script" adequately, it does; the real issue is whether Phoenician should
be separately encoded at all.
If Hebrew were not already encoded in Unicode, I could foresee two,
possibly tenable, courses of action:
* Unilaterally encode the 22 Old Canaanite letter characters as such, and
additionally encode the various supra-consonantal systems erected on this
script. This would cover practically everything we've been talking about
- Phoenician, Punic, Neo-Punic, Old Hebrew, Moabite, Ammonite, Edomite,
Samaritan, Old Aramaic, Official Aramaic, Square Hebrew. etc.
(Essentially, with some name changes, the situation we currently enjoy.)
or
* Separately encode only Old Canaanite and Hebrew/Aramaic, along with its
adjunct systems - deferring judgment on the politically-loaded Samaritan
issue. (Essentially what we would have if only the current proposal were
adopted.)
But, what I'm afraid of with this proposal, as I've stated before, is
that its adoption will set a precedent that will result in a snowballing
of West Semitic encodings, leading to the third scenario, which I find
unacceptable:
* Separately encode Phoenician, Old Hebrew, Samaritan, Archaic Greek, Old
Aramaic, Official Aramaic, Hatran, Nisan, Armazic, Elymaic, Palmyrene,
Mandaic, Jewish Aramaic, Nabataean ...
I actually have not yet made up my mind about the advisability of
encoding Phoenician/Old Canaanite; I continue to weigh the input we've
been getting here. But I am tending to think that the tradeoffs are in
favor of not separately encoding multiple West Semitic diascripts. The
only benefit to encoding I see is the enabling of rendering changes (aka,
font changes) in plain text. But weighed against the complexity
introduced for searching and other text processing, that benefit seems
small indeed, especially when we realize that the discipline has, in
large part, worked with unified texts for centuries.
Which segues nicely into your next remarks:
>A few contributors to this list have argued that separate encoding is
>unnecessary and shouldn't happen on the grounds that the user community
>doesn't / wouldn't make use of it.... Well, I can certainly tell you
>that my user / research community (i.e. Near Eastern history,
>archaeology and Egyptology) remains incredibly conservative in nearly
>all their practices - their current practice overall is certainly no
>guide to what *should* be happening.... Some of us *are* trying to
>pioneer and teach different practices - the use of XML / XHTML, the
>application of Unicode instead of different fonts, for example - but it
>is a slow, slow process.
I am sympathetic to this assessment of the conservative nature of many
practices in Ancient Near Eastern studies. (After all, I have personally
witnessed resistance to my, and others, efforts to encode Sumero/Akkadian
cuneiform.) But to say "their current practice overall is certainly no
guide to what *should* be happening" is too strong for me. I tend to try
to look for the best in the past in order to combine that with the best
in the present. In this particular case, that MAY mean encoding Old
Cannanite, or it may not. But I have yet to see a compelling reason to
introduce the added complexity.
Respectfully,
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi
This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:26 CDT