From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue Aug 02 2005 - 12:08:18 CDT
On Thu, 28 Jul 2005, Rick McGowan wrote:
> 74 Change to Default Localization for NaN in CLDR
>
> There has been a request to change the default localization for a NaN from
> the character U+FFFD REPLACEMENT CHARACTER to another representation. The
> NaN floating-point value means "Not a Number", and represents an undefined
> result of a mathematical operation.
Maybe we can discuss this issue on this list preliminary, to avoid missing
something obvious. I think the key question whether the value of NaN is a
single character, as currently defined in the prose of the LDML
specification:
"NaN is represented as a single character, typically (\uFFFD). This
character is determined by the localized number symbols."
( http://www.unicode.org/reports/tr35/#Number_Format_Patterns
under the heading "Special values")
I can see reasons for requiring that the general, culturally neutral
symbol for NaN be a single Unicode character (though we really haven't got
a suitable character for it now). I can even see reasons for using the
symbol that has been used in Java. But shouldn't _localization_ aim at
allowing data to be rendered in a format that is understandable to people,
without need for knowing special conventions and with the information
presented in natural language known to each user, if possible, or at least
using abbreviations that they are familiar with?
Thus, it would seem logical to allow any string as the value of NaN and to
assume that typical localized values are strings like "Not a Number" or
"undefined result", in different languages. After this, issue 74 could be
considered in a new context. (Keeping the current default value would be
one option, and "NaN" might be another.)
Am I missing something (obvious or non-obvious) here?
I would expect that the value of NaN will mostly be used in
localized output from numerical calculations and diagnostic messages.
In diagnostic messages (assuming that program execution is, for some
reason, aborted due to a computation producing NaN) I would expect the
value of NaN appear more or less standalone, so it could be of any
reasonable length. But is the one-character requirement based on an idea
of filling a numeric output field of some prescribed width by a character,
used in as many copies as needed for the fill? (Much like we used to see
fields filled with **** in FORTRAN output.) For _such_ purposes, the NaN
indicator would need to be single character. However, would it make sense
to localize such data? (If the restriction is indeed based on such
considerations and if localization is regarded as useful, I think the LDML
specification should explain this, to make it easier to people to make
reasonable proposals on what the value might be in some locale.)
Similar considerations apply to infinity (with the exception that there is
a widely known reasonable one-character default value for it; but people
might still find a word, like "infinity", more widely understood).
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Tue Aug 02 2005 - 12:09:48 CDT