From: Jim Allan (jallan@smrtytrek.com)
Date: Sat Dec 27 2003 - 00:01:10 EST
Mark E. Shoulson wrote:
> This is a particularly cogent point.  The Mishna (c. 1st century C.E.) 
> does explicitly distinguish between Paleo-Hebrew and Square Hebrew 
> (tractate Yadayim 4:5).  That's not a font-difference, that's a 
> script-difference, I think.
There were no such things as fonts in the 1st century C.E. So it would 
have to be a script-difference. But what is a "script"?
"Script", as I pointed out previously, is a word of wide meaning. The 
difference between Paleo-Hebrew and Square Hebrew is a script 
difference. But the word "script" is also used for different varieties 
of the Square Hebrew script. Check in Google for ["rashi script"] or 
["ari script"] . There is a two-volume book:
_Specimans of Medieval Hebrew Scripts_ by Malachi beit Arie. See 
http://www.bookgallery.co.il/content/english/static/book8177.asp
Check also in Google for ["italic script"], ["uncial script"], 
["blackletter script" OR "black letter script"].
We are talking about exactly the same alphabet (or abjad) here, 
twenty-two letters in the same order with identical meaning originating 
from the same sources recording the identical text with identical spelling.
Compare the gradual change from blackletter "scripts" to Antiqua style 
Latin characters (including the italic script) in Renaissance and 
post-Renaissance Europe. This is similar to the change from Phoenician 
style to Aramaic style.
> This is the other really significant point: Semitic scholars may all 
> agree, but all the world is not Semitic scholarship, and non-{Semitic 
> scholars} have to be satisfied as well.  Since the Semitic scholars 
> are also getting what they want, where's the harm in encoding more 
> alphabets?
Who are these non-scholars who want the Palmyrene script (for example) 
to be encoded separately from other Aramaic scripts? Who are the 
scholars who want this? How many persons in the world want Palmyrene to 
be encoded separately? As many as fifty? Or is there just Michael Everson?
There may be some such scholars, and if so I would like to hear the 
arguments they would bring forth. I'm willing to be convinced by 
arguments. I'm not an *expert* in Aramaic scripts. There aren't that 
many who are.
As to harm, where's the harm in encoding Japanese kanzi separately, or 
Latin uncial, or a complete set of small capitals as a third case? 
Where's the harm in encoding Latin Renaissance scripts separately?
No harm perhaps, but no good either. There is no need or use for such 
encodings. Scholars using Latin letters and non-scholars using Latin 
letters are not asking for separate coding of the script used in the 
Beowulf manuscript and so forth. They don't want every Latin "script" 
variation encoded separately.
>   It's not *that* simple: one could argue (as is being done) that more 
> alphabets would lead to confusion about which one should be used, and 
> mess up searches.  I guess we'd just have to make sure that people 
> doing scholarly work in Semitic languages know to use Hebrew all the 
> time (they already know that), no matter what the language.
But the point is that many of these Semitic language use the *same* 
abjad with different styling, one such styling being the letters encoded 
in Unicode as Hebrew letters with default glyphs of modern Hebrew form. 
Only the letter shapes are different. But between some northwest Semitic 
"scripts" they are not very different, less so than between Latin 
"script" and Latin "script".
Second, people doing work in Semitic languages using the Latin alphabet 
do also often use Latin transliterations  (which do not all agree). I 
assume that there are also standard Cyrillic transliterations used by 
scholars using the Cyrillic alphabet and so forth.
Such things are not for Unicode to regulate.
>   And in cases where material is to be incorporated from non-scholarly 
> sources who used another alphabet, that can be transcoded when entered 
> into databases to keep them uniform if that's what's necessary, but 
> presumably that wouldn't happen often.
What non-scholarly sources? Why would a non-scholar *need* or *desire* 
Palmyrene Aramaic encoded separately while a scholar would not? A change 
to a Palmyrene Aramaic font would do the job as well, for Palmyrene 
Aramaic and any of the various Aramaic "scripts" or "styles" just as a 
font change does for historical styles for European scripts if someone 
want to print of display them. In fact such fonts do poorly, just as a 
general black letter medieval font will do poorly for anything but the 
exact manuscript on which it was based, if based on a particular 
manuscript. There are no fonts before modern times, no exactly 
standardized characters, no exactly standardized type styles. Every 
scribe has a different hand. Characters in simple charts of Semitic 
scripts are often deceptive just as charts of forms taken by medieval 
Latin characters in particular "scripts"/"styles" are deceptive, often 
being a choice made by a scholar from many variants.
Coding Aramaic generally as a single script in Unicode would code all 
the "script" variations. This has already been done by encoding the 
square Aramaic letters in their "modern Hebrew" forms. What more is 
needed for encoding? Similarly Latin has been encoded with modern Latin 
letter forms as the default glyphs and Greek has been encoded with 
modern Greek letter forms as the default glyphs. One might want some 
further final forms and additional punctuation for Aramaic styles (or 
might not). That can be decided.  Otherwise,  there is nothing much more 
to do, save perhaps add a matrix somewhere showing variant glyphs in 
different Aramaic "scripts"/"styles".
To take another example, all runic "scripts" have been unified in 
Unicode, though the runic "scripts" vary greatly in the number of 
letters used and in the values of the letters as well as in their 
appearance. There is more *reason* to produce separate encodings for the 
various runic scripts then for northwest Semitic "scripts", though I've 
heard no complaints about the unification of runic "scripts" and I have 
no complaints myself.
Indeed, there is no *reason* when looking at the values of the 
characters of the Semitic "scripts" related to Phoenician that there 
could not have been a single encoding for the consonants for *all* these 
supposed "scripts" (with separate encodings for the pointings).
A common Semitic encoding *could* still be added to Unicode, with 
individuals deciding whether or not to use that coding also for Arabic, 
Hebrew and Syriac.
I am not recommending this.
I am pointing out how much these scripts are seen to be stylistic 
variants of one another to one who can to some extent read them.
If one must split them up, charts and scholarly books do provide normal 
divisions of "scripts" or "styles" which correspond to those given by 
Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf 
All that has been well worked out for the common "scripts". A normal 
division is:
1.) Proto-Sinaitic and other early pictographs.
2.) Old Arabic "scripts" (Old South Arabic and Old North Arabic).
3.) Northwest Semitic (the 22-character abjad including Phoenician 
scripts, descendant Aramaic scripts such as square Aramaic used for 
Hebrew and also including Syriac).
4.) Arabic (which though descended from Nabatean Aramaic became so 
different that it might be better encoded separately, perhaps to be 
compared to the Aramaic scripts in somewhat the same way as Latin might 
be compared to early Greek scripts).
The common 22-character Northwest Semitic abjad can be broken down into:
1.) Phoenician/Canaanite scripts including Paleo-Hebrew and its 
descendant Samaritan and also Paleo-Aramaic.
2.) Later Aramaic scripts.
3.) Syriac scripts which differ greatly in appearance from the other 
Aramaic scripts.
Note: special appearance and pointing for Hebrew and Syriac is really 
the only reason to distinguish these particularly. The letters are the 
same in origin and are more the same in meaning than between Greek 
script and variant Greek script.  Greek letters in variant Greek scripts 
however are (generally) far more alike in appearance than the characters 
of the various early northwest Semitic "scripts"/"styles".
But should a difference in appearance count in a decision to code 
separately within Unicode when *every* other feature of two "scripts" is 
identical, including origin?
Hebrew scriptures were first written in the Phoenician script (= 
Paleo-Hebrew), then in Aramaic script which developed *very* slightly in 
medieval times to the normal modern Hebrew script. Emerson's division 
would suggest four different scripts ought to be used for coding the 
same texts with the same logical characters with the same names, that 
texts should be encoded as Phoenician or Aramaic or Hebrew or Samaritan 
depending on style, when when letter-by-letter the same.
Cursive Hebrew still retains for some letter forms the Phoenician shapes 
(which is very strange). Should cursive Hebrew therefore be encoded 
separately?
I don't see any purpose in encoding these scripts differently in Unicode 
when they represent *exactly* the same abjad with only different styling 
of the characters.
Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf could 
only say:
<< Note that Jony Rosenne once suggested that we should not encode 
Phoenician because it is a
glyph variant of Hebrew. This is not true, despite the one-to-one 
correspondence of character entities. In the Dead Sea Scrolls, for 
instance, where the Tetragrammaton is written with Paleo-Hebrew letters, 
it is (in UCS
encoding terms) the Phoenician script in which the Name is written. >>
First, there is not *just* a one-to-one correspondence of character 
entities but also one-to-one correspondence of the characters in respect 
to their origin and names. They *are* the same abjad in all but style.
Second, if it is argued that the use of Phoenician script for the 
Tetragrammaton in some texts otherwise written in square Aramaic 
characters indicates that Phoenician and square Aramaic characters must 
be encoded separately within Unicode, should not one make the same 
argument for medieval texts with a headline "script" imitating 
traditional Roman square capitals, initial paragraphs in uncial "script" 
and the main text in Carolingian "script" including majuscule and 
miniscule letters?
If Everson's argument is applied to medieval manuscripts, uncial 
"script" and Carolingian "script" and Roman capitals should be encoded 
separately within Unicode.
Also, the Tetragrammaton is represented in the English King James 
translation of Hebrew scriptures and in some more recent translations by 
the word LORD and sometimes GOD in which all but the first letter is 
printed in small capitals. Should small capitals therefore be encoded 
separately in Unicode?
(Note: these small capitals are the small capitals normally used for 
emphasis and usually appear slightly higher than the normal lowercase 
characters lacking ascenders. They are not the same as the lower case 
small capital characters coded in Unicode as phonetic characters which 
properly appear as identical in height to other lower case characters.)
That characters of one style are used in a text written predominately in 
another style does not indicate that the "script" or "style" to which 
they belong needs to be coded independently. That is what markup is for.
Peter Kirk has already made this point in part.
There seems to me *no* reason why most of Aramaic "scripts" should not 
be unified within Unicode with Hebrew and almost *no* reason why 
Phoenician and Samaritan should not be unified.
And there seems to me *little* reason why Hebrew/Aramaic "scripts" and 
Phoenician/Samaritan "scripts" should not be unified. The two families 
of styles use the same abjad though with differences in appearance too 
great for most of the letters to be seen as the same letters between the 
two families by appearance alone.
But how much should visual distinction count when it is the *sole* 
difference? It appears to me that this  is where dispute lies mostly, 
despite the precedent of the Unicode encoding of runic "scripts".
There may also be some thinking of HTML/XML/XHTML web display of 
characters where forcing of font is not reliable. One would not want a 
discussion of ancient Phoenician characters to display modern Hebrew 
forms! But this same problem currently applies to runes, medieval Latin 
characters, Han characters and so forth. One shouldn't let the current 
shortcomings of one display method among many dictate Unicode encodings.
Jim Allan 
This archive was generated by hypermail 2.1.5 : Sat Dec 27 2003 - 01:33:46 EST