Re: Library of Congress diacriticized character data

From: Mark Davis (mark_davis@taligent.com)
Date: Mon Jul 28 1997 - 13:25:20 EDT


I would be quite surprised that only 312 have Unicode equivalents. Which
combining marks are missing that do not allow the missing characters to
be composed?

(I was unable to access
http://www.locke.ccil.org/~cowan/elsie/elsie.html; Communicator said
that www.locke.ccil.org lacked a DNS entry.)

Mark

John Cowan wrote:

> I have posted the L of C data on diacriticized characters to
> my Web page at http://www.locke.ccil.org/~cowan/elsie/elsie.html .
> This is the result of the printouts that James Agenbroad sent me
> some months ago. I typed them in, massaged them to get USMARC
> and Unicode equivalents, et voila.
>
> The data illustrate the diverse base+combining characters that
> are needed for bibliographic purposes. There are 1152 diacriticized
> characters in the file, of which only 312 have Unicode equivalents.
> (I may have missed some, and some sequences are probably bogus
> encodings, like A WITH ACUTE WITH ACUTE, which is probably an error
> for A WITH DOUBLE ACUTE.)
>
> Enjoy!
>
> --
> John Cowan cowan@ccil.org
> e'osai ko sarji la lojban.





This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT