Let me comment on some replies to my definition of writing
system:
RM>A writing system can include more than one alphabet, or
script, as an element -- and always includes some behavior.
I agree, that a writing system includes behaviours, but my
definition didn't allow for it to include more than one script.
Carl-Martin wrote:
However, the concept of "writing system" is lacking, and I
would see it in a somewhat different way than Peter Constable
did in his
contribution...
In my understanding, a writing system is a concept located on a
higher level: It is the totality of graphical symbols of the
semiotic system used by a certain language community.
Consequently, a writing system may comprise more than one
script, and in fact even the English (or German or French ...)
writing system allows for e.g. Roman numbers, technical
symbols, etc. which, in our daily written communication,
co-occur with Latin script letters on panels, technical
instructions, etc. No-one, however, would enumerate e.g. Roman
numbers in the alphabet. On the other hand, the set of
different scripts admitted in a definite writing system is
limited, since the English language community will not
understand e.g. a Devanagari script unit (as, say, an
abbreviation or a technical label): no communicative value has
been defined for this symbol.
The "classical" example for a complex "writing system",
according to this understanding, would be the Japanese system
using (at least!) three scripts simultaneously.
I dare state that in terms of information technology, "writing
system" almost corresponds to "locale", at least as far as the
use of graphical symbols is concerned.
[end of quotation]
I agree that Carl-Martin's perspective is different from mine,
though I think his use fit with the way Rick was using the
term.
Let me expand a little on how I arrived at my definition. (This
is also the definition being used by my co-workers with the
Non-Roman Script Initiative and those developers in SIL that
are working on implementing multilingual capabilities in SIL
software.)
My perspective comes from my background as a linguist
previously working in Southeast Asia and now working as part of
a team trying to address the needs of enabling software to work
for *all* of the world's thousands of languages, minor as well
as major. (You can look at www.sil.org/ethnologue/ to see what
our mandate is.) As we have looked at these issues, it has not
been numerals and technical symbols that have presented the
biggest challenges. Rather, it is the world's scripts, and
minority language orthographies based on those scripts. In what
we have been needing to do, technical symbols (other than IPA)
were not at all in our minds as we talked of writing systems.
We recognised that, as people talk of scripts, they can tell
unambiguously (assuming familiarity) that a character belongs
to some particular script but not to others. For example,
nobody would dispute that the character which has the Unicode
name ETHIOPIC SYLLABLE XWA belongs to a script that most people
call "Ethiopic". At the same time, there are many languages
that are written using Ethiopic script, and not all of them use
the character just mentioned. Likewise, not all of these
languages necessarily have the same collation sequence
(collation sequences certainly can't be the same if the
orthographic inventories aren't the same). For that matter, for
one of these languages, it's possible that there may be more
than collation sequence involved.
The way a given script is used in a particular language
includes an orthographic inventory that is a subset of those
characters in the script, and it includes language-specific
collation sequences. It can also include language-specific
behaviour. Let me give an example with Thai script: This script
includes certain characters which are written above other
characters, such as SARA II, MAI EK, MAITAIKHU. Let's call this
set Cs (combining (superior)). Now, in the implementation of
this script for the Siamese (Std Thai) language, MAITAIKHU
cannot co-occur with any other Cs character. This is part of
the behaviour of Siamese writing, and software implementations
often enforce this behaviour. This happens, for example, in
Thai versions of Microsoft software. When Thai script is used
for writing other languages spoken in Thailand (e.g. Bru) which
have quite different phonology, however, it may be necessary to
combine MAITAIKHU with other Cs characters. Thus, the writing
behaviour of Siamese and Bru are different.
As we enable our software to work with Amharic, Tigrinya and
Gurage, Siamese, Eastern Red Karen and Bru, we need to define a
collection of information that describes everything related to
how a script is used for writing a particular language. This
isn't the same as the script, because a script is implemented
differently in different languages. It isn't the same as the
language, which includes more than just script-related
information, and a given language can be written with
completely different scripts (more on that in a moment). We
needed something below the level of script, below language, and
"writing system" is what we chose.
Now, there is also the issue that a given language may be
written with more than one script. Both Rick and Carl-Martin
have referred to this. Carl-Martin gave Japanese as an example,
and I suspect Japanese and Korean may have been in Rick's mind.
I have to admit that CJK is not what I'm most familiar with,
and it wasn't foremost in our thinking as we grappled with
these issues. There are many cases of languages which are
written with more than one script, but I think Japanese and
Korean are exceptions to the norm, even if they are the cases
most familiar to a lot of people.
In Japanese and Korean, a single writer will use Chinese
characters and Hangul, Chinese and Katakana and Hiragana, and
will even use them in a single document, on a single page, and
in a single sentence. In these languages, there are certain
words that can only be written using one or the other script,
and so a writer may be forced to alternate. There are far more
cases in the world, however, in which a given writer will use
exclusively one script, usually because that's the only one
that they know, and that is the norm for their language. A few
examples:
- Serbo-Croatian: written by some using Latin script and others
using Cyrillic
- Tai Dam (spoken in Vietnam, Laos, US, France): written by
some using traditional Tai Dam script, by others using
Vietnamese-style Latin, and by others using Lao
- Tai Lue (spoken in Yunan, Laos, Thailand): written by some
using Lanna script, by others using New Tai Lue script (a
simplifying revision of Lanna script with enough changes that
it should be considered a different, even if related, script)
- Koorete (spoken in Ethiopia): written by some using Ethiopic
script, by others using Latin
- Wolaytta (spoken in Ethiopia): written by some using Ethiopic
script, by others using Latin
- Hindi/Urdu: written by some using Devanagari, by others using
Nastaliq Arabic
- Duruwa (spoken in India): written by some using Devanagari,
by others using Oriya
This is but a small sample of a situation that is evolving as a
large number of minority languages are just beginning, or have
yet, to become literary languages. There are other minority
languages in Ethiopia that use both Ethiopic and Roman; there
are other languages in India that use more than one script of
that region, and this is probably true of some neighboring
countries; I suspect that this situation occurs in Insular
Southeast Asia; and there are numerous languages in Southeast
Asia, where languages are often spoken in 2 to 4 countries and
may also have traditional scripts, for which this is or likely
will become the case.
In all of these situations, a given document would generally
appear in only one script; if more than one script is ever used
in a single document, it would be a polyglot in which the
different scripts are clearly separated.
In summary, for our software development needs, we have needed
to define a term which represents the combination [ language x
script ] and have called this "writing system", and have chosen
to define writing system to be the implementation of a single
script, since that is by far the most common case we will have
to deal with. We will need to consider how the cases of
Japanese and Korean will impact us, so it has been good for me
that this discussion has forced me to think about these cases a
little more.
Before I finish, I had mentioned some other sources which gave
a definition of writing system in line with our use, and I
thought I'd just mention those:
The first quotation I was thinking of comes from an article in
the November 1998 issue of Microsoft Systems Journal,
"Supporting Multilanguage Text Layout and Complex Scripts with
Windows NT 5.0", by F. Avery Bishop, David C. Brown, and David
M. Meltzer, pp. 57 - 70. On page 59 in that article, they give
a glossary. These are two of the definitions given:
Script: A collection of characters for displaying written text,
all of which have a common characteristic that justifies their
consideration as a distinct set. One script may be used for
several different languages... and some written languages
require multiple scripts (for example, Japanese... )...
Writing system: The collection of scripts and orthography
required to represent a given human language in visual media.
Their definition of script is in agreement with mine. Their
definition for writing system, though, is more in line with
that given by Rick and Carl-Martin. When I first read the MSJ
article over three months ago, I was struck most by the fact
that they were making the important distinction between script
and writing system, and I didn't take note of the way in which
their definition of writing system disagrees from mine. So, my
memory on the point currently under discussion was in error.
Again, CJK wasn't a big factor in my thinking, but it very
obviously has been an important consideration for Microsoft.
The second source is a manuscript by Richard Sproat (to appear,
"A Computation Theory of Writing Systems"):
"...we will use the terms 'script', 'orthography' and 'writing
system', in their conventional senses as follows. A 'script' is
just a set of distinct marks conventionally used to represent
the written form of one or more languages: crucially, one can
speak of a script without implying its use for a given
language... On the other hand, a writing system is a script
used to represent a particular language. Thus 'writing system'
implies 'writing system for a given language'. We will use the
terms 'orthography' and 'writing system' interchangeably..."
(Sproat adds a note here about distinctions between orthography
and, say, technography as discussed by Mountford which I
referred to in my original message.) It seems to me that
Richard's definitions are precisely in agreement with mine. I
note with interest that Richard's work looks at a variety of
writing systems and scripts, including Russian, Belorussian,
Korean, Chinese, Japanese, Devanagari, Pahawh Hmong, Ancient
Egyptian, Aramaic. While he has considered CJ and K, he is
attempting to cover scripts and writing systems in full
generality. While his work is theoretical, I think there is
important similarity with the practical work we are attempting
to do in that we are developing very general implementations
that can deal with any case.
Peter Constable
Non-Roman Script Initiative, SIL
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT