The Proposed Draft UTR #42: An XML Representation of the UCD is available for review.
In addition, a representation of the 5.0.0 UCD in XML is available. There are six files, available in zip/jar format; the size if that of the archive:
flat | grouped | |
no Unihan data | 456 KB | 308 KB |
Unihan data only | 4,835 KB | 4,838 KB |
complete UCD | 5,904 KB | 5,145 KB |
The flat versions do not use the group mechanism. The grouped versions use the group mechanism, with groups corresponding approximately to the blocks (a few blocks have been subdivided).
The no Unihan data files do not contain the properties expressed in the Unihan.txt UCD data file. This implies that the nt and nv attributes are not necessarily correct for the ideographs (but they match the values given in UnicodeData.txt). The Unihan data only files contain only the properties and code points expressed in Unihan.txt. The complete UCD files reflect the complete UCD data.
A zip/jar archive (21,484 KB) of the six files is also available.