Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)

From: Mark E. Shoulson (mark@kli.org)
Date: Mon May 24 2004 - 21:55:28 CDT

  • Next message: Mark E. Shoulson: "Re: Response to Everson Ph and why Jun 7? fervor"

    John Hudson wrote:

    > Peter Constable wrote:
    >
    >> I was not involved in those discussions so cannot comment on them. I
    >> just wish to point out that the MCW representation of Hebrew most
    >> certain *is* supported in Unicode: MCW uses ASCII Latin letters and
    >> punctuation characters to stand for Hebrew letters, vowel points and
    >> accents, and those exact same ASCII characters are encoded in Unicode.
    >
    >
    > This was an 8-bit hack, the point which Elaine and other Biblical
    > Hebrew scholars make is that MCW explicitly encodes distinctions
    > between some marks, based on positioning, that the Unicode Hebrew
    > block unifies. This means that while MCW text can be easily converted
    > to Unicode Hebrew, it is not possible to round-trip such conversion in
    > the same way that Unicode provides for pre-existing 8-bit standard
    > character sets.

    Hmm... Is that even true anymore? I think the only ones remaining are
    things like the auxiliary telishas and pashtas, which can and should
    properly be handled by the font. If the text is *valid*, there can
    never be an auxiliary pashta on a non-final letter, nor a
    "non-auxiliary" pashta on a final letter, thus one can unambiguously
    reconstruct which was what. Zarqa and Tsinnor are misnamed, but
    workable. The metegs are still not done quite satisfactorily, though.
    And the canonical classes are an unfixable mess, and there are a few
    other small things too, I guess...

    > One of the unfortunate aspects of this is that the ASCII-hack MCW
    > encoding will likely remain the source encoding for many electronic
    > Biblical Hebrew texts for some time to come, even if published texts
    > are re-encoded as Unicode Hebrew, since MCW permits simple and
    > unambiguous plain-text encoding of distinctions that are important to
    > textual analysis. For example, although my clients at Libronic use
    > Unicode encoding for their electronic BHS edition (because it provides
    > greater interchangeability), they maintain an MCW encoded text as
    > their master source. So much for the 'universal' character set...

    Hey, the "Universal" set isn't always the Right Tool for specialized
    jobs. It's nice when it is, but MCW has *worked* for them for a long
    time, and since they control it, it *must* have precisely the features
    they need. Why not keep with what works?

    ~mark



    This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 21:56:13 CDT