Re: mixed-script writing systems

From: Dean Snyder (
Date: Thu Nov 21 2002 - 09:04:49 EST

  • Next message: Otto Stolz: "Re: Lowercase numerals"

    John Cowan wrote at 4:54 PM on Friday, November 15, 2002:

    >Dean Snyder scripsit:
    >> Group A writes the logically ordered graphemic sequence *acme* as "acme";
    >> group B as "emca".
    >This fact requires separate encoding, because bidi-ness is a noncontextual
    >property of a Unicode character.
    >> Group A pronounces the graphemic sequence "acme" as /acme/; group B as
    >This is irrelevant.
    >> Group A uppercases the graphemic sequence "acme" as "ACME"; group B as
    >> "acme" (i.e., no uppercase).
    >This is handled with language-specific casing.
    >> Group A ligates the sequence "acme" as "a" + "cme"; group B as "ac" + "me".
    >This isn't even a Unicode issue, it's a font issue.

    Notwithstanding these specific responses to my questions, the broader
    point of my example has been missed; or at least no reply addressed the
    more abstract issue for which I was trolling.

    What are the properties which will trigger separate Unicode encodings for
    characters typically or always represented by identically shaped glyphs?
    Are these rules formalized anywhere in Unicode documents? Given Ken
    Whistler's response to another email in this thread, I suspect not,
    especially for historic scripts, about which I am most interested:

    Kenneth Whistler wrote at 12:49 PM on Monday, November 18, 2002:

    >Andrew West wrote:
    >> Nevertheless, Gothic has
    >> been encoded in Unicode, and this may provide an unwelcome precedent for
    >> encoding other mixed-script writing systems.
    >What you are getting at is the complicated problem of sorting out all
    >the historical connections between various related alphabets and trying
    >to sift them into categories which make sense as scripts and categories
    >which are simply font variants within a script. For modern scripts this
    >is less of a problem, since we have modern practice and typography to
    >rely on to help make the distinctions. For *historic* scripts, on the
    >other hand, it is murkier.
    > ...
    >What it comes down to is the fact that for historic scripts in
    >particular, there are no defined criteria that would enable us
    >to simply *discover* the right answer regarding the identity of
    >scripts. To a certain extent, the encoding committees need to
    >make arbitrary partitions of historic alphabets through time
    >and space, reflecting scholarly praxis as far as feasible, and
    >then live with the results. At least this procedure makes it
    >*possible* to represent the texts reliably, once the scripts
    >and their variants have been standardized.

    What are the criteria used to make these "arbitrary partitions"? What is
    determinative of "scholarly praxis"? And would not some or all of the
    examples I give above be governed by such criteria?


    Dean A. Snyder
    Scholarly Technology Specialist
    Center For Scholarly Resources, Sheridan Libraries
    Garrett Room, MSE Library, 3400 N. Charles St.
    The Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229
    Digital Hammurabi:
    Initiative for Cuneiform Encoding:

    This archive was generated by hypermail 2.1.5 : Thu Nov 21 2002 - 09:48:32 EST