Kenneth Whistler wrote:
> The key word here is *transliteration*. Anything which is transliterated
> into Latin should of course be represented with Latin characters.
>
> It is precisely for the palaeographers that separate encoding of
> Gothic as a *script* makes sense.
Yes, but!
People who work with hand-written (or cuneiform, or engraved) documents
require access to one (or more) of three levels of representation: the
original script level, the transliteration level, and the transcription
level. (The transcription level is irrelevant to this message and will
be ignored here.)
The original script level is what is important to palaeographers: only
this level preserves the details of variant handwritings, of erroneous
letter shapes, of idiosyncratic abbreviations, etc. etc. This information
can only be conveyed by computer using a pictorial representation.
Some abstraction from this is possible using fancy text: if multiple
handwritings have been identified, the text can be tagged to show
which handwriting is being used over a given range of the text.
SGML entities can represent unique or rare abbreviations or erroneous
letters. Fancy text permits the original analogue document to be
reduced to a denumerable infinity of handwriting variants.
But reduction to plain text is the *essence* of transliteration.
Everything except the bare identity of the letters has been discarded as
irrelevant. All non-palaeographic processing of the text (morphological
analysis, syntax analysis, lexicostatistics, transcription, translation)
can work from the transliteration level, which involves reducing
the denumerably infinite handwriting variants to a finite list of
symbols.
It is, abstractly considered, all the same whether the characters used
in the transliteration are Latin letters or ones taken from the native
script. An Old Church Slavonic manuscript is equally transliterated
whether it is represented in Cyrillic or in Latin print; considerations
of convenience, cost, or readability may influence the decision to use
one or the other, but in either case the palaeographic aspects of the
manuscript are lost. The same would be true of representing Gothic
manuscripts in Gothic or Latin plain text, or Etruscan inscriptions in
Etruscan, Greek, or Latin plain text.
The Unicode Way is to represent only plain-text differences with
distinct character ranges: the four or five distinct Georgian alphabets
described at http://titus.uni-frankfurt.de/personalia/jg/unicode/unicode.htm
are collapsed onto two (bicameral text and unicameral text), which are
then further reduced by mapping unicameral text to lower-case bicameral
text.
This means that fancy-text methods are required to distinguish one
kind of Georgian from another, and no single font can capture them all,
but because they are "the same" at the level of transliteration, it
suffices to use just one or 1 1/2 mappings.
We don't go further and unify Georgian with Greek or Latin, because the
letters are unrecognizably different from any sort of Greek. But this
consideration doesn't apply to Gothic or Etruscan, which closely
resemble the sorts of Greek writing that were in use during the
appropriate span of time. They would doubtless be unreadable with
a modern Greek font, but modern Greek text would be hard to read
in a font modeled after 5th-century B.C.E. inscriptions, and modern
English text would be almost impossible to read in a font designed on
the basis of Carolingian miniatures, or for that matter 18th-century
German handwriting, both of whic
-- John Cowan http://www.ccil.org/~cowan cowan@ccil.org e'osai ko sarji la lojban
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT