Library of Congress diacriticized character data

From: John Cowan (cowan@locke.ccil.org)
Date: Mon Jul 28 1997 - 01:25:31 EDT


I have posted the L of C data on diacriticized characters to
my Web page at http://www.locke.ccil.org/~cowan/elsie/elsie.html .
This is the result of the printouts that James Agenbroad sent me
some months ago. I typed them in, massaged them to get USMARC
and Unicode equivalents, et voila.

The data illustrate the diverse base+combining characters that
are needed for bibliographic purposes. There are 1152 diacriticized
characters in the file, of which only 312 have Unicode equivalents.
(I may have missed some, and some sequences are probably bogus
encodings, like A WITH ACUTE WITH ACUTE, which is probably an error
for A WITH DOUBLE ACUTE.)

Enjoy!

-- 
John Cowan					cowan@ccil.org
		e'osai ko sarji la lojban.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT