more compact Han collation tailorings

We should make better use of the compact syntax in the collation tailorings, especially in the large Han tailorings. These are machine-generated. It should not be hard to fix the tool.

For example, the unihan tailorings look like this:

<'\uFDD0'⼂ # INDEX 3
<*丶 # 3.0
<*丷𪜊 # 3.1
<*丸义𠁼𠁽 # 3.2
<*丹为𠁿 # 3.3
<*主丼𠂀𠂁 # 3.4
<*𠂂 # 3.5
<*𪜋 # 3.7
<*举 # 3.8
<*𠂃 # 3.9
<*𠂄 # 3.12
<*𠂅 # 3.15

There should be one compact range per index marker (we need to break the compact syntax around their contractions). Instead, there is an extra <* for each new stroke count just so that we can add the comments. Besides, we use <* even when only a single code point follows; we should just use < where that is then still the case. The data should look like this instead:

<'\uFDD0'⼂ # INDEX 3

This would save dozens of kB in rule strings, especially when they are stored in UTF-16 where <* is 4 bytes (like in ICU).

Similar in pinyin:

<'\uFDD0'A # INDEX A
<*阿呵𥥩锕𠼞𨉚 # ā
<*嗄 # á
<*啊 # a
<*𡉓哎哀唉𠳳埃娭挨欸㶼𡟓𢰇溾嗳𤸖銰锿噯鎄 # āi
<*𠊎𫘤啀捱皑溰䠹嘊敱敳㱯𤸳皚𦩴癌𧪚騃𩪂𩮖䶣 # ái

should be

<'\uFDD0'A # INDEX A

Similar (less bad) also in zhuyin.


