From: Christopher Fynn (cfynn@gmx.net)
Date: Sun May 30 2004 - 13:27:07 CDT
John Hudson wrote:
....
> I have been thinking today that part of the reason for the debate is
> that Unicode has a singular concept of 'script', a bucket into which
> variously shaped concepts of writing systems must be put or rejected.
> I don't think there is anything conceptually wrong with the idea that
> specific instances of a single script might be separately encoded if
> there is a need or desire to distinguish them in plain text. It just
> happens that Unicode has only one word that can be applied to such
> instances, and that is 'script'. It seems clear to me now that what
> Unicode calls a script needn't necessarily be what semiticists, or
> anyone else, calls a script. A functional Unicode definition of script
> might be formed as: a finite collection of characters that can be
> distinguished in plain text from other collections of characters.
John
"Script" is already defined in ISO 10646 as:
<<4.35 script: A set of graphic characters used for the written form of
one or more languages.>>
and "graphic character" is defined as :
<< 4.20 graphic character: A character, other than a control function,
that has a visual representation normally handwritten, printed, or
displayed.>>
So I guess if any further definition of "script" is necessary it should
be based on this.
Further the (draft?) ISO 15924 standard uses the same definition
<< 3.7 script A set of graphic characters used for the written
form of one or more languages.(ISO/IEC 10646-
1)(fr 3.6 écriture )>>
but adds an extra note:
<< NOTE 1:A script,as opposed to an arbitrary subset of
characters,is defined in distinction to other scripts;in
general,readers of one script may be unable to read the
glyphs of another script easily,even where there is a
historic relation between them (see 3.9).>>
[ 3.9 script variant
A particular form of one script which is so
distinctive a rendering as to almost be considered
a unique script in itself.(fr 3.9 variante d ’écriture )]
With regard to historic & archaic scripts TUS itself states
"The overall capacity for more than a million characters is more than
sufficient for all known character encoding requirements, including full
coverage of all minority and historic scripts of the world. " (1.0 )
and
"As the universal character encoding scheme, the Unicode Standard must
also respond to scholarly needs. To preserve world cultural heritage,
important archaic scripts are encoded as proposals are developed." (1.1.2)
So there is a clear statement of purpose to give full coverage to *all*
minority and historic scripts and to encode "important" archaic scripts.
In 1.2 "Design Goals" TUS states:
"The primary goal of the development effort for the Unicode Standard was
to remedy two serious problems common to most multilingual computer
programs. The first problem was the overloading of the font mechanism
when encoding characters."
Telling people who propose a script that they can "just use a
different font " could very easily contradict this stated goal.
> There are very real issues of software implementation, font
> development, collation, text indexing and searching, etc. that arise
> from encoding multiple instances of what some users consider a single
> script, whether users in general opt to make the distinction in plain
> text or not, by using the separate character collections or unifying
> text in a single character collection and making the distinction at a
> higher level. I'm beginning to think that our time would be better
> spent thinking about those issues.
>
These are of course real issues - particularly collation, text
indexing, searching and - where a written language occurs in several
scripts - the ability to display text encoded in one script with glyphs
of another. Establishing standard, straightforward and widely supported
means to deal with these issues is a worthy goal. In many cases the
solutions for these problems is in fact already specified or pretty
clear - and, relatively speaking , these are reasonably straightforward
to implement.
Thier absecence - or lack of support - should not be a reason to
reject a script proposal on the grounds that "it will cause
difficulties" - this is sort of kind of argument put forward by PR China
when they submitted their proposal for a host of precomposed Tibetan
characters. When Indic scripts were first encoded a whole software
infrastructure and font/rendering technologies which were not then
available in common desktop operating systems was assumed - and it has
taken a decade for this encoding to be anything like widely supported on
a practical level.
The solutions for these problems already specified or pretty clear -
and, relatively speaking , reasonably straightforward to implement.
IMO, in the long term, encoding of archaic scripts is going to benefit
the whole scholarly community. When children discover all kinds of
scripts on their computers they are going to become curious and play
with them and some of them will be inspired to go out and find out more
about these scripts. Some of these will develop a serious interest and a
few will end up being the Palaeographers, Semiticists, Sanskritists and
so on of tomorrow.
- Chris
This archive was generated by hypermail 2.1.5 : Sun May 30 2004 - 13:31:22 CDT