On 3/9/2016 7:08 PM, Oren Watson wrote:
I was
surprised to find out that there are gaps in the
Mathematical alphanumeric symbols block (U+1d400 to
u+1d7ff). The gaps are associated with the inclusion of
similar symbols in other blocks, chiefly the Letterlike
Symbols Block.
Correct.
Examples of
such gaps include U+1d49d, U+1d506, etc.
But as a matter of convenience and simplicity,
As a matter of history, the characters that would have gone into
those gaps were already encoded.
The stated purpose for alphanumerics in math is to serve for
variables. For example, that means they are not intended to be used
as list markers, which would have been a use case for which a
contiguous range would be essential. Variable names are not usually
indexed, but if they must show up in sorted lists, any capable sort
algorithm can be set up so the weights make them contiguous across
the gap (if the UCA tables do not do that by default already,
perhaps it's worth ensuring that they do).
these
missing codepoints could have been defined, as decomposing
directly to the equivalents in Letterlike symbols, in the
same manner that the Ångström sign decomposes to the
letter Å. That would make these ranges contiguous.
The original case for the Ångström as for the Kelvin was that is has
been encoded twice in some other standards. The historical mistake
was to not code them as part of the "squared" abbreviations, because
that's where they came from, in the mistaken belief that it would be
generally useful to have these and not the regular Å and K for the
units.
None of that applies to the alphanumerics, so it's good to have
avoided the duplicate encoding.
Is there a
policy about leaving gaps in otherwise contiguous ranges of
codepoints?
I believe UTC tends to avoid gaps, but will leave them if the
circumstances of the case warrant that. In this case, not leaving
gaps and silently skipping already encoded characters, would have
had the effect of misleading user into expecting a complete
alphabet, so the gap was the less-bad alternative.
A./
Received on Thu Mar 10 2016 - 10:44:10 CST