From: John Hudson (john@tiro.ca)
Date: Fri Mar 03 2006 - 12:48:30 CST
E. Keown wrote:
> Character set, a definition :
> A character set is a computerized version
> of any alphabet (or other writing system).
>
> Each letter, number, symbol, etc. of the
> computerized alphabet is assigned a unique
> number for the computer to use in software.
This definition suggests, or presupposes, a direct correlation between the structure of an
encoding and the structure of a writing system. The problem with this is that a) it is not
always the case, b) the structural analysis of writing systems is a relatively new field,
and c) there is sometimes disagreement about how to correctly describe the structure of a
writing system (see, for instance, the discussions regarding Tamil on the Indic list). I
am wary of a definition that would lead people to conclude that a character encoding must
directly correlate to a particular understanding of the structure of a writing system. The
goal of a character encoding is to be workable, i.e. to enable the encoding of text and
the performance of typical text processing functions (searching, sorting, string
comparisons, etc.). It is not the goal of a character encoding to provide a computerised
model of how a writing system is thought of in the minds of the people who write it or
study it.
The Unicode glossary defines Character Set as
A collection of elements used to represent textual
information.
which seems to me to be a good place to start. Notice that the definition references the
*use* of the characters, rather than their identity as it relates to writing systems. The
Unicode gloassary seems to me quite a good 'Fachwörterliste':
http://www.unicode.org/glossary/
If I wanted a more explanatory definition for 'non-geeks', I would try something like this:
A character set is a collection of elements (letters,
symbols, punctuation, numerals, etc.) needed to represent
text on a computer. Each element in a character set is
assigned a unique numeric identity, which is recognised
by computer software employing the character set.
Standardised character sets facilitate the interchange of
text between computers, and enable computerised text
processing operations such as searching, sorting, and
comparing text. A particular character set may encode
one or more writing systems.
John Hudson
-- Tiro Typeworks www.tiro.com Vancouver, BC john@tiro.ca I am not yet so lost in lexicography, as to forget that words are the daughters of earth, and that things are the sons of heaven. - Samuel Johnson
This archive was generated by hypermail 2.1.5 : Fri Mar 03 2006 - 12:50:46 CST