kenwhistler at att.net
Mon Mar 14 12:01:46 CDT 2016
On 3/13/2016 12:03 PM, Doug Ewell wrote:
> My point is that of J.S. Choi and Janusz Bień: the problem with
> declaring NamesList off-limits is that it does contain information
> that is either:
> • not available in any other UCD file, or
> • available, but only in comments (like the MAS mappings), which aren't
> supposed to be parsed either.
NamesList.txt is not "off-limits". The information in it is there because it
is useful for publication in the Unicode code charts, to help with the
identification and interpretation of the characters in the standard.
And because NamesList.txt itself is published as part of the UCD, nobody
is going to stop you (or anybody else) from parsing information out of it.
The trick is this: the status of annotational data in NamesList.txt is
than that of normative data like the code points, names, formal name
aliases, decomposition mappings, and standardized variation sequences.
Annotations are -- well, annotational -- and there are no guarantees
about their completeness or stability, and so on. They emerge from a
kind of ongoing rugby scrum between the UTC members,
national body comments on 10646 amendments, public suggestions
via feedback and email lists, and the ability of editors to accommodate
reasonable suggestions that might help the readability and usefulness
of the names list without larding it up to heavily with extraneous
that would make it *harder* to use.
People who parse NamesList.txt for data almost inevitably and immediately
end up expecting it to do things it does not (and cannot reasonably) do.
See this thread right here for pertinent examples. *That* is the problem
I see, because it then tends to lead to frustrated clamoring for
NamesList.txt to be "fixed" to do things and carry information that
it wasn't (and isn't) designed to do.
More information about the Unicode