Re: UCA unnecessary collation weight 0000

From: Markus Scherer via Unicode <unicode_at_unicode.org>
Date: Thu, 1 Nov 2018 13:08:05 -0700

There are lots of ways to implement the UCA.

When you want fast string comparison, the zero weights are useful for
processing -- and you don't actually assemble a sort key.

People who want sort keys usually want them to be short, so you spend time
on compression. You probably also build sort keys as byte vectors not
uint16 vectors (because byte vectors fit into more APIs and tend to be
shorter), like ICU does using the CLDR collation data file. The CLDR root
collation data file remunges all weights into fractional byte sequences,
and leaves gaps for tailoring.

markus
Received on Thu Nov 01 2018 - 15:08:41 CDT

This archive was generated by hypermail 2.2.0 : Thu Nov 01 2018 - 15:08:41 CDT