Compatibility Casefold Equivalence from - - via Unicode on 2018-11-22 (Unicode Mail List Archive)

From: - - via Unicode <unicode_at_unicode.org>
Date: Thu, 22 Nov 2018 04:23:11 -0500 (EST)

Hi,

In Chapter 3 Section 13, the Unicode spec defines D146:

"A string X is a compatibility caseless match for a string Y if and only if: NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = NFKD(toCasefold(NFKD(toCasefold(NFD(Y)))))"

I am trying to understand the "if and only if" part of this. Specifically, why is the outermost NFKD necessary? Could it also be a NFKC normalization? Is wrapping the outer NFKD in a NFC or NFKC on both sides of the equation okay?

My use case is that I am trying to store user-provided tags in a database. I would like the tags to be deduplicated based on compatibility and caseless equivalence, which is how I ended up looking at D146. However, because decomposition can result in much larger strings, I would prefer to keep the stored version in NFC or NFKC (I *think* this doesn't matter after doing the casefolding as described above).

Thanks,

Carl

Received on Thu Nov 22 2018 - 09:51:06 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 22 2018 - 09:51:06 CST