String Ranges in Unicode Sets
doug at ewellic.org
Tue Sep 8 10:19:03 CDT 2015
Mark Davis ️ <mark at macchiato dot com> wrote:
>> TUS 8.0 Chapter 3 C6: "A process shall not assume that the
>> interpretations of two canonical-equivalent character sequences are
> A compiler will take source code containing String x="á"; and compile
> it to a certain binary. If that same source code is NFD'd, the
> compiler will produce a different result.
> Do you really think that such compiler is not compliant to Unicode??
> If so, then we should add some more clarifications around C6.
I agree. The word "interpretations" in C6 can't have been intended to
include the interpretation of code points qua code points. That would
make a great many internal processes impossible.
I think of C6 as meaning that spell-checkers, for example, should not
treat José (NFC, four code points) and José (NFD, five code points)
as separate entries.
Doug Ewell | http://ewellic.org | Thornton, CO
More information about the Unicode