Re: Chapter on character sets

From: Valeriy E. Ushakov (uwe@ptc.spbu.ru)
Date: Thu Jun 15 2000 - 10:26:19 EDT


On Thu, Jun 15, 2000 at 04:02:18 -0800, Lars Marius Garshol wrote:

> I would be glad if people here could read through it and tell me if
> they see any mistakes (or other kinds of things that could be
> improved).

It would be good if 1.2 (and few other places) followed the UTR17,
"Character Encoding Model": <http://www.unicode.org/unicode/reports/tr17/>
There was (and still is) a lot of confusion about the terminology and
UTR17 sets this straight.

| ISO 8859 ... has thirteen other character sets for different parts
| of the world, all of which are identical to ASCII in the lower 128
| characters and then add 128 additional characters.

That's 96 additional characters. 0x80-0x9F is a C1 region for control
characters (as well as 0x00-0x1F is a C0 region for controls).

| In Common Lisp characters belong to a special data type and
| conversion between characters and numbers must be done with special
| functions. This mapping is not defined in the standard (the ANSI
| Common Lisp standard predates the Unicode effort by two years), but
| many implementations use Unicode.

The fact that CL standard doesn't define which coded character set is
used has nothing to do with the fact that it predates Unicode. CL is
very carefully worded to allow implementations to chose whatever they
see fit and still be conforming. So the parenthetic remark might
sound offensive to Lisp devotees.

SY, Uwe

-- 
uwe@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT