The UCD is currently offered on http://www.unicode.org/Public as:
The main reason for this organization was to minimize the size of the UCD data, in part to make downloads easier. However, this approach also has a number of problems:
Reconstructing an old version of the UCD a non-trivial task. Besides having to go to previous versions to find some files, the user must also know to ignore some older files with are not propagated (e.g. Props.txt in 2.1.9 is not part of 3.0), or have been moved (e.g. DerivedNormalizationProperties.txt in 3.1 renamed to DerivedNormalizationProps.txt in 3.2).
Since an Update directory does not necessarily include all the files of that release, and those present have different names than they have in the UNIDATA directory, we cannot make hyperlinks between the files. For example, UCD.html mentions StandardizedVariant.html, but does not link to it. Hyperlinks would force us to go across Update directories, and/or make some files different between the UNIDATA and the Update directories.
The space saving afforded by the organization is no longer a constraint. A complete UCD 4.0, including Unihan.txt, is less than 33 megabytes; and a compressed ZIP is less than 7 megabytes. Furthermore, each release tends to touch most of the files, thereby defeating the incremental organization.
The overall proposal is to stop publishing new Update directories, and instead to publish each version of the UCD as a self-contained set of files.
Here are more specific details of this proposal:
The proposed layout is to have one subdirectory in http://www.unicode.org/Public for each release:
4.0.0/The ucd directories would contains all the UCD files for the corresponding releases, and hyperlinks between the files (represented as relative links) would be allowed.
The purpose of the new intermediate ucd directory is to provide a home for other data that is part of a release, such as specific versions of the UAXes or the code charts. Ultimately, the last published book plus the content of the directory for a release would form a complete definition of corresponding version of the standard. However, adding those components is not part of this proposal.
The UNIDATA entry would be retained, and be made to have the same content as the directory of the latest version (either by some linking/redirecting magic, or simply by having a copy of the same content).
We should also rebuild the directories corresponding to earlier releases, starting with 2.0.0 (it is just not worth going further back in time, and the data is not available in electronic form for all the 1.x releases).
We should also provide a ZIP file for each release, simply to facilitate http-based access. The proposal is to have one ZIP file named ucd-release.zip per release, placed in the directory for that release (that is, next to the ucd directory it contains). Filenames in that ZIP file would include directories, starting with the directory of the release; in other words, the file 4.0.1/ucd-4.0.1.zip would contain:
4.0.1/ucd/ArabicShaping.txtAuthor: Eric Muller
Revision | Date | Comments |
August 5, 2004 | Initial version |