Re: Library of Congress diacriticized character data

From: Joan Aliprand (BR.JMA@rlg.org)
Date: Mon Jul 28 1997 - 15:09:40 EDT


J"org Knappen writes:

>Interestingly, only standard latin letters occured as base letters, but not
>additional latin letters. What has happened to combination like
>OPEN E WITH TILDE AND ACUTE (several african languages)? Do they never show
>up in the LOC data or were they filtered out by preprocessing?

The base letter repertoire in this collection is the USMARC Latin character
set, whose repertoire as currently implemented is a subset of ANSI/NISO
Z39.47, American National Standard Extended Latin Alphabet Coded Character
Set for Bibliographic Use ("ANSEL").

OPEN E is not one of the additional letters in ANSEL.

>Some combination also look bogus to me (I with candrabindhu = I (dotted)
>with tilde/circumflex probably).

The combinations in this list come from two sources: natural languages, and
transliteration of certain languages according to "ALA/LC Romanization
Tables" (published by the Library of Congress).

Bogus cases may be due to misunderstanding of the original source, or
typographical error.

The ANSEL standard includes tables of use (by language, and by individual
character) for the character modifiers and special characters in the
Extended ASCII range. However, they do not include all combinations for
Vietnamese, nor recent combinations for transliteration authorized by the
American Library Association and the Library of Congress.

-- Joan Aliprand
   Research Libraries Group

To: UNICODE@UNICODE.ORG



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT