From: John Hudson (john@tiro.ca)
Date: Fri Mar 03 2006 - 12:48:30 CST
E. Keown wrote:
> Character set,  a definition :    
> 	A character set is a computerized version 
>         of any alphabet (or other writing system).  
> 		
>         Each letter, number, symbol, etc. of the
>         computerized alphabet is assigned a unique
>         number for the computer to use in software.  
This definition suggests, or presupposes, a direct correlation between the structure of an 
encoding and the structure of a writing system. The problem with this is that a) it is not 
always the case, b) the structural analysis of writing systems is a relatively new field, 
and c) there is sometimes disagreement about how to correctly describe the structure of a 
writing system (see, for instance, the discussions regarding Tamil on the Indic list). I 
am wary of a definition that would lead people to conclude that a character encoding must 
directly correlate to a particular understanding of the structure of a writing system. The 
goal of a character encoding is to be workable, i.e. to enable the encoding of text and 
the performance of typical text processing functions (searching, sorting, string 
comparisons, etc.). It is not the goal of a character encoding to provide a computerised 
model of how a writing system is thought of in the minds of the people who write it or 
study it.
The Unicode glossary defines Character Set as
        A collection of elements used to represent textual
        information.
which seems to me to be a good place to start. Notice that the definition references the 
*use* of the characters, rather than their identity as it relates to writing systems. The 
Unicode gloassary seems to me quite a good 'Fachwörterliste':
        http://www.unicode.org/glossary/
If I wanted a more explanatory definition for 'non-geeks', I would try something like this:
        A character set is a collection of elements (letters,
        symbols, punctuation, numerals, etc.) needed to represent
        text on a computer. Each element in a character set is
        assigned a unique numeric identity, which is recognised
        by computer software employing the character set.
        Standardised character sets facilitate the interchange of
        text between computers, and enable computerised text
        processing operations such as searching, sorting, and
        comparing text. A particular character set may encode
        one or more writing systems.
John Hudson
-- Tiro Typeworks www.tiro.com Vancouver, BC john@tiro.ca I am not yet so lost in lexicography, as to forget that words are the daughters of earth, and that things are the sons of heaven. - Samuel Johnson
This archive was generated by hypermail 2.1.5 : Fri Mar 03 2006 - 12:50:46 CST