Meroitic cursive fractions numerical values
Karl Williamson
public at khwilliamson.com
Mon Mar 30 14:38:54 CDT 2015
On 03/29/2015 03:41 AM, Andrew West wrote:
> On 28 March 2015 at 20:05, Karl Williamson <public at khwilliamson.com> wrote:
>>
>> Existing software that looks at the numeric values of characters is written
>> expecting that rational numbers will have been reduced to their lowest form.
>
> That seems to be a rather rash statement. I have software (BabelPad)
> which parses the numeric values of characters for numeric sorting
> purposes, and it parses "6/12" for MEROITIC CURSIVE FRACTION SIX
> TWELFTHS as 0.5. Personally I find it hard to imagine how you could
> write software that accepts "6/12" as input and is unable to come up
> with the answer of a half.
The statement is not rash, as it is simply a statement of objective
fact. I am the maintainer of software that fails with beta 8.0 due to
this change. And it has nothing to do with not being able to do
arithmetic division; your assumption was wrong.
The software essentially creates a database of Unicode properties for
regular expression pattern matching. so that someone can say
/\p{Numeric_Value=0.5}/
and quickly determine if the matched string contains a code point with
that characteristic. Because the database is copied as-is to many
different computers with different word sizes and different floating
point implementations, it can't do the division ahead of time because of
the inherent fuzziness of floating point numbers. It solves this the
same way Unicode has, by leaving rational numbers in their original
precisely specified format. Thus it creates a table for the
property-value combination of Numeric_Value and 1/2, taking the UCD
value as-is.
Prior to beta 8, the UCD came with all fractions already reduced. It
would not occur to someone with a mainly mathematical or computer
science background that the input data would come otherwise, as the
mathematical convention is to specify in irreducible terms, even though
this isn't promised by Unicode, so of course there is no code to handle
the new case. The code thus creates a second table for the
property-value combination of Numeric_Value and 6/12, which causes problems.
It's a small matter to add code to reduce the UCD-specified rational
numbers, but it's just one more complication to have to deal with along
with the many that the UCD already presents, and if there is not a good
reason the data for these new characters is specified contrary to
mathematical convention, then the data should be changed instead of
having to code around it.
>
> I would say that fractions should not be reduced to their lowest form
> in the Unicode data as some people may need to order fractions by
> numerator or denominator, and reducing to lowest form could break the
> expectations of some software. Having said that, I note that the
> numeric value of one character has been reduced in the Unicode data:
> U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of "0"
> rather that "0/3".
So there is some precedent for reducing.
>
> Andrew
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
More information about the Unicode
mailing list