annotations (was: NamesList.txt as data source)

Janusz S. Bień jsbien at
Sun Mar 13 00:55:24 CST 2016

On Thu, Mar 10 2016 at 22:40 CET, kenwhistler at writes:

> The *reason* that NamesList.txt exists at all is to drive the tool,
> unibook, that formats the full Unicode code charts for posting. 


On Fri, Mar 11 2016 at  3:13 CET, asmusf at writes:
> On 3/10/2016 5:49 PM, "J. S. Choi" wrote:

>> One thing about NamesList.txt is that, as far as I have been able to
>> tell, it’s the only machine-readable, parseable source of those
>> annotations and cross-references.


> This is a different issue. The nameslist.txt is a reasonable source
> for driving other formatting programs than just Unibook.


A student of mine wrote a font sampling program producing output in a
Unibook-like form. For this purpose he wrote also a converter from
NamesList format to XML:

I use the XML version of NamesList to provide my own comments to
characters (work in progress):

Other examples of NamesList.txt use are

Although not exactly the formatting programs, in my opinion they
constitute also a valid use.

> In fact, the possibility of reuse in this context probably among the
> unstated rationales for making the information and syntax available in
> the first place.

I understand there is no intention to make an official XML version of
the file as it would require changes in Unibook?


>> What are these other primary sources that maintain these other
>> annotation data; are they publicly available? If the name list is the
>> only place where these sources’ data have been published, then, for
>> better or for worse, the name list is all that is available for much
>> information on many code points’ usage.

> See my first through third paragraph.

You wrote:


> There are explanations about character use that are only maintained in
> the PDF of the core specification, where this information is packaged
> in a way that can be understood by a human reader, but is not amenable
> to be extracted by machine.
> While the annotations, comments, cross references etc. in Namelist.txt
> appear, formally, to be machine extractable, the way they are created
> and managed make them just as much "human-accessible" only as the core
> specification.

I'm afraid it's not clear for me. Let's take an example. Sometime ago I
inquired about a controversial alias for U+018D:

Can I really find anything about "reversed Polish-hook o" in the core
specification which is not a literal copy of the information from

Best regards


Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at, jsbien at,

More information about the Unicode mailing list