Fix GenerateUnihanCollators (old compensating code)

GenerateUnihanCollators generates the data for the several tailorings. It has a bunch of code that tries to compensate for earlier bad Unihan data. That may not be necessary anymore (and may possibly interfere with newer corrected data).

Review the code to see whether those hacks can be removed.


The data files (and corresponding internal code) to to look at are the patch files:


Part of what the code does is to for those where data is missing (and those alone), synthesize the total stroke counts, by using the radical-stroke info, and adding the strokes of the radical to the remainder. While clearly an approximation, it is better than having no information at all. That is then overridden where we have info by the stroke info in the patch files.

CJK_Radicals.csv should use the newer Unicode file

For Unicode 9, in http://www.unicode.org/utility/trac/changeset/1047 I changed GenerateUnihanCollators to get most of the radical-stroke data from org.unicode.text.UCA.RadicalStroke -- to get it working again and to reduce duplicate parsing code.

The old CJK_Radicals.csv seems to have data for all of the 2E80..2EFF CJK Radicals Supplement which seems to have been used for "closure" of the old data structure, so I am still using it for fallbacks via the radicalMap. Only some of those mappings can be gleaned from UCD CJKRadicals.txt, otherwise I could have pushed most of the fallback handling down into UCA.RadicalStroke.

I reviewed the code, and it looks good to me.


