Re: [hebrew] Re: Karaite manuscript

From: Behnam (behnam.rassi@gmail.com)
Date: Sun Jul 22 2007 - 19:57:21 CDT

  • Next message: Karl Pentzlin: "Re: Orthographies using ZWNJ (was: Displaying control characters)"

    You may have a practical view on this.
    But if I was to encode Arabic script from the scratch, I'd retained
    only the bare characters with different behaviour (double joiner,
    right only joiner etc... and for the occasion, no left break!) and a
    collection of dots and signs and diacritics, then I'd leave the
    assembly to the keyboard designer and font maker.
    But of-course I'm not!

    Behnam

    On 22-Jul-07, at 2:08 PM, Philippe Verdy wrote:

    > Are you making here a proposal to encode the Arabic
    > archaeographemes/archeographemes (or “archigraphemes” as you call
    > them, but I’m not sure this is a correct term for English, as
    > “archi-“ is another prefix with another meaning to mark emphasis,
    > stronger than “super-” and quite similar to “hyper-“), i.e. the
    > skeletons (without the normally required markers), and possibly
    > too, the markers themselves, separately ?
    >
    >
    >
    > If these were encoded in some extended Arabic block, I’m not sure
    > it will cause severe havoc. Even for searches over the Internet or
    > in plain-text documents, the morphological similarities between
    > otherwise unrelated modern letters can be analyzed by some custom
    > “decomposition” using PUAs (for now, because these units are not
    > encoded separately), or using a tailored collation…
    >
    >
    >
    > As this will be needed for palaeographic studies, most of the
    > existing texts will not have to be re-encoded and changed, even if
    > they appear to be really composite letters. Anyway, the Unicode
    > stability prohibits “decomposing” them using any normalized
    > decomposed forms ; this can still be done privately or through
    > local collation algorithms, built specifically for paleographers.
    > There should be no change to existing Arabic texts, and the letters
    > should not be decomposed in standard texts.
    >
    >
    >
    > Anyway, the issue is quite similar with other letters in alphabetic
    > scripts : the ae and oe ligatures in Latin can be decomposed in
    > some languages, and they still should be decomposed when doing
    > morphological analysis, even in today’s modern texts (at least in
    > French), even if they should not be decomposed this way in standard
    > texts (but it’s true that Unicode provided compatibility
    > decompositions for them, something that was not done for Arabic
    > letters with markers, and that can’t be done now)…
    >
    >
    >
    > De : Thomas Milo [mailto:t.milo@chello.nl]
    > Envoyé : jeudi 19 juillet 2007 22:09
    > À : Simon Montagu; verdy_p@wanadoo.fr
    > Cc : 'John Hudson'; unicode@unicode.org; 'Hebrew List'
    >
    >
    >
    > All these observations about asynchronic text notation (text
    > recorded in phases) using independent character subsets
    > (archigraphemic skeleton, disambiguation dots, vowel marks) even
    > across nominally different writing systems also pertain to Arabic.
    > Particularly regarding the text transmission of the Holy Qur'an
    > this is very relevant.
    >
    >
    >
    > HQ Codices of the first few centuries were written without
    > consonant markers (originally not points but small nib imprints)
    > and vowel disambiguation marks (which were points in the earliest
    > Arabic script). Editors (contemporary or later) added the consonant
    > disambiguation markers and vowel signs (personal communication from
    > Yasin Dutton during the Corpus Coranicum Workshop organized by the
    > European Science Foundation in 2005, Berlin).
    >
    >
    >
    > http://www.esf.org/activities/exploratory-workshops/humanities-sch/
    > 2005/corpus-coranicum-exploring-the-textual-beginnings-of-the-
    > quran.html
    >
    >
    >
    > To this day, this horizontal segmentation remains the deep
    > structure of Arabic. Understanding it helps to deal with its
    > generative power to combine any marker with any basic letter (i.e.,
    > archigrapheme). Hebrew, Aramaic and Arabic do occur in various
    > mixes along this horizontal segmentation, which provides an
    > additional argument for dealing with the horizontal segmentation of
    > Arabic and related scripts.
    >
    >
    >
    > Unicode's present fixation with vertical segmentation (leading to
    > the irrelevant concept of ligatures) in Arabic and national subsets
    > leads to
    >
    >
    >
    > 1. uneconomical proliferation of Arabic code points consisting of
    > generic archigraphemes and generic markers
    >
    > 2. serious problems in digitizing historical and even contemporary
    > texts.
    >
    >
    >
    > For examples of see my Unicode Tutorial, page 7 for examples of
    > Unicode-induced ambiguity in encoding exactly identical Arabic
    > character groups and on page for examples of 12 the resulting every-
    > day chaos:
    >
    > www.decotype.com/publications/unicode-tutorial.pdf
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Jul 22 2007 - 19:59:51 CDT