From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Feb 17 2010 - 18:01:54 CST
Michel Bottin responded:
> Le 17/02/10 23:16, John H. Jenkins a écrit :
> > The Roman numerals are there for the sake of compatibility with
> > older standards only and their use should be avoided. It's better
> > to simply build the Roman numerals you want to use out of the
> > appropriate Latin letters.
> >
> But then we lack the numeric order.
The numerical ordering of Roman numerals is not in the scope of
the Unicode Standard, nor, for that matter, even in the scope of
the Unicode Collation Algorithm.
> For example for the numbers 1-24,
> 30, 40, 50, 60, 70, 80, 90, 100 the collating sequence of the Latin
> letters give:
>
> C, I, II, III, IV, IX, L, LX, LXX, LXXX, M, V, VI, VII, VIII, X, XC, XI,
> XII, XIII, XIV, XIX, XL, XV, XVI, XVII, XVIII, XX, XXI, XXII, XXIII,
> XXIV, XXX
So?
You get the same kind of mishmash for every acrophonic or other
letter-based numerical system out there, including Greek letters
used as numerals and Hebrew letters used as numerals.
> and then for the kings of France, "Louis IX" (Saint Louis) precede
> "Louis V" and an hypothetic "Louis XIX" would have preceded "Louis XIV"!
No, the *string* "Louis IX" collates as less than the *string* "Louis V"
(either in binary order or in UCA collation order), but then the
*string* "Table 10" collates as less than the *string* "Table 5"
(either in binary order or in UCA collation order), too. Such
problems of ordering of numerals embedded in text are not addressed
by a character encoding.
> I understand the restriction of use for compatibility, but I think that
> we really lack at least, the following figures necessaries to write
> every roman numeral:
>
> I, II, II, III, IV, V, VI, VII, VIII, IX, X, XL, L, XC, C, CM, M
>
> encoded each as a unique character in a continuous sequence, with
> corresponding numeric properties.
Not only would this fail to address the full scope of the problem
of numerals embedded in text (Roman or otherwise) -- it would just
further complicate the problem of representation of Roman numerals
in Unicode by putatively adding a *third* way to represent them.
That is something further guaranteed to confuse people, rather than
clarifying anything.
If you want to make progress on handling numerals in text, the
obvious alternative is to work with marked up text, instead, where
numerals can be unambiguously tagged as to their scope and
exact values.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Feb 17 2010 - 18:03:34 CST