Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 24 2004 - 18:31:28 CDT

  • Next message: Mike Ayers: "RE: [BULK] - Re: MCW encoding of Hebrew (was RE: Response to Ever son Ph and why Jun 7? fervor)"

    Philippe asked:

    > > In fact, any existing
    > > MCW/ASCII-encoded file of Hebrew text is, in fact, also
    > > MCW/Unicode-encoded since the representation of Basic Latin
    > > characters at the character encoding form and character
    > > encoding scheme levels is exactly the same for ASCII as it is
    > > for Unicode:
    > > Hebrew MCS/ASCII MCS/Unicode
                   ^^^ ^^^
    recte: MCW MCW

    > > literal code unit literal UTF-8
    > > ------------------------------------------------
    > > alef ) 0x29 ) 0x29
    > > bet B 0x42 B 0x42
    > > gimel G 0x47 G 0x47
    > > ...
    > > To encode any different from this in Unicode to support MCW
    > > texts would have been fairly bad news for the people that use
    > > it.
    >
    > Is it a joke?

    No, it is not a joke.

    This is precisely what people have been doing for years to represent
    non-Latin data in "ASCII".

    Peter's point is that the Michigan-Claremont-Westminster Biblical
    Hebrew code is a set of conventions for using ASCII characters
    to represent Hebrew letters (and points and other Biblical
    annotations). Anyone using such conventions in their data
    of course depends on the deliberate continuity between ASCII
    and Unicode for the interpretation of U+0000..U+007F. This is
    neither accident nor particularly surprising.

    > You can't say that the tableabove is ASCII not either Unicode.
    > It's only a separate legacy 7-bit encoding..

    No, it is not. It is not a character encoding standard, but
    rather a set of transcriptional conventions for the use of
    ASCII characters to represent linguistic data.

    *Displaying* or *printing* such data then involves an interpreter
    of those conventions -- which might be as simple as an ASCII-encoded
    font hack.

    > which is probably
    > not widely interoperable because unimplemented or not documented
    > in the same common places as where ASCII and Unicode are defined.

    It isn't a character encoding, Philippe.

    And it was (and is) as interoperable as it was intended to be --
    it works for the group of people who shared those conventions.

    Along these lines, and following up on the University of Leiden
    site with the store of Punic and Neo-Punic inscriptions, one can
    note there such listings as a Libyan Punic inscription, Lepcis Magna N14,
    recorded as:

    1) mynkd #q`ysr! #`wgsTs! bn `l+m rb mHnt p`mt `srw'Ht
       wmynkd p`m't `sr w'rb` wt+H+t mSlt `sr hmSlm p`m't `sr wHmS `d[r
       
    Now whatever we decide about the status of Punic -- whether the
    characters in the original inscription are a variety of an Old
    Canaanite script separately encoded from Hebrew or whether they
    are font variants of the already-encoded Hebrew, it should be
    pretty obvious that the actual data sitting in the Leiden
    database here and dished up on the web pages, is *ASCII* data,
    and represents a set of transcriptional conventions for the Punic
    data, rather than an exact representation of the original
    characters.

    This kind of data, by the way, is what Michael Everson keeps pointing
    to as widespread in Semitic studies -- and it requires more than
    blind reliance on Hebrew string matching to expect to find matches.
    Even after you get the 22-letter conventions between the ASCII
    Latin forms and Hebrew lined up correctly, you *still* need to
    account for all the *other* conventions in use, including
    morphological marking and editorial conventions for lacunae,
    interpolated text, and everything else that might be sitting in
    the data. Compared to those kinds of problems, the issue of
    whether the 22 basic Semitic letters can also be represented in
    a Phoenician script or not pales to the minor molehill it actually
    is, in my opinion.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 18:32:40 CDT