From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 24 2004 - 18:31:28 CDT
Philippe asked:
> > In fact, any existing
> > MCW/ASCII-encoded file of Hebrew text is, in fact, also
> > MCW/Unicode-encoded since the representation of Basic Latin
> > characters at the character encoding form and character
> > encoding scheme levels is exactly the same for ASCII as it is
> > for Unicode:
> > Hebrew MCS/ASCII MCS/Unicode
^^^ ^^^
recte: MCW MCW
> > literal code unit literal UTF-8
> > ------------------------------------------------
> > alef ) 0x29 ) 0x29
> > bet B 0x42 B 0x42
> > gimel G 0x47 G 0x47
> > ...
> > To encode any different from this in Unicode to support MCW
> > texts would have been fairly bad news for the people that use
> > it.
>
> Is it a joke?
No, it is not a joke.
This is precisely what people have been doing for years to represent
non-Latin data in "ASCII".
Peter's point is that the Michigan-Claremont-Westminster Biblical
Hebrew code is a set of conventions for using ASCII characters
to represent Hebrew letters (and points and other Biblical
annotations). Anyone using such conventions in their data
of course depends on the deliberate continuity between ASCII
and Unicode for the interpretation of U+0000..U+007F. This is
neither accident nor particularly surprising.
> You can't say that the tableabove is ASCII not either Unicode.
> It's only a separate legacy 7-bit encoding..
No, it is not. It is not a character encoding standard, but
rather a set of transcriptional conventions for the use of
ASCII characters to represent linguistic data.
*Displaying* or *printing* such data then involves an interpreter
of those conventions -- which might be as simple as an ASCII-encoded
font hack.
> which is probably
> not widely interoperable because unimplemented or not documented
> in the same common places as where ASCII and Unicode are defined.
It isn't a character encoding, Philippe.
And it was (and is) as interoperable as it was intended to be --
it works for the group of people who shared those conventions.
Along these lines, and following up on the University of Leiden
site with the store of Punic and Neo-Punic inscriptions, one can
note there such listings as a Libyan Punic inscription, Lepcis Magna N14,
recorded as:
1) mynkd #q`ysr! #`wgsTs! bn `l+m rb mHnt p`mt `srw'Ht
wmynkd p`m't `sr w'rb` wt+H+t mSlt `sr hmSlm p`m't `sr wHmS `d[r
Now whatever we decide about the status of Punic -- whether the
characters in the original inscription are a variety of an Old
Canaanite script separately encoded from Hebrew or whether they
are font variants of the already-encoded Hebrew, it should be
pretty obvious that the actual data sitting in the Leiden
database here and dished up on the web pages, is *ASCII* data,
and represents a set of transcriptional conventions for the Punic
data, rather than an exact representation of the original
characters.
This kind of data, by the way, is what Michael Everson keeps pointing
to as widespread in Semitic studies -- and it requires more than
blind reliance on Hebrew string matching to expect to find matches.
Even after you get the 22-letter conventions between the ASCII
Latin forms and Hebrew lined up correctly, you *still* need to
account for all the *other* conventions in use, including
morphological marking and editorial conventions for lacunae,
interpolated text, and everything else that might be sitting in
the data. Compared to those kinds of problems, the issue of
whether the 22 basic Semitic letters can also be represented in
a Phoenician script or not pales to the minor molehill it actually
is, in my opinion.
--Ken
This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 18:32:40 CDT