From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Oct 28 2005 - 12:21:09 CST
Regarding Roman numerals, there are still missing combining numerals to form 
the large numbers, i.e. the combining C on the right, and the combining 
turned C on the left. These should combine with a central I. Alternatively, 
the combining C on the right could be a combining C and reversed C, added 
after the central I.
The existing CD thousand numeral is in fact a ligature of a central I and 
the two symbols. A better (less confusive) name should have been CID (where 
the D represents the reversed C, and is not confusive because the roman 
numeral D cannot immediately follow the roman numeral I except when used in 
combination after a leading C), rather than CD which means 400.
If one prefers, we could avoid encoding the central I, by using the existing 
CD thousand numeral for meaning 1000, and adding combining numerals after it 
(or after D meaning 500) to multiply its value by 10.
So the multiplicator by 10 could be a unique combining character: it will 
have the form of a half circle combining on the right if it follows the 
roman D numeral base character, and the form of a surrounding full circle if 
it follows the roman CD-thousand numeral base character.
For now, it is impossible to represent correctly and consistently the Roman 
numbers 5,000 and 10,000 (made with a double left half-circles or double 
circles), 50,000 and 100,000 (made with a triple left half-circles or triple 
circles)...
The only approximate alternative is to not use the existing Roman numerals 
at all, and revert to Latin letters, and then use C, I, and OPEN O (which 
looks quite similar to the turned C, except that the serif on is missing on 
the bottom leg, when drawn with serif fonts), or to replace the sequence 
<I,TURNED C> by <D>, and possibly add joiner controls between them to 
request (and may be force) their ligature.
So to represent 888,888, you have to write the following sequences with 
Latin letters instead or Roman numerals (I add spaces between what should be 
combining sequences to make the number easier to read, but these spaces 
should not be present, and use D after I instead of a combining TURNED-C 
after I):
IDDD CCIDD CCIDD CCIDD = 800,000
IDD CID CID CID = 80,000
D M M M = 8,000
D C C C = 800
L X X X = 80
V I I I = 8
This results in the compact string:
IDDDCCIDDCCIDDCCIDDIDDCIDCIDCIDDMMMDCCCLXXXVIII
which would be much easier to read if it actually used the ligatures of 
combining sequences.
--------
Another thing that is missing is the representation of thousand multiples: 
it can be either a combining M, stretched above the complete sequence that 
it multiplies, or a combining macron that is also stretched over the 
complete sequence it multiplies. (Note that there can be several multipliers 
stacked above the sequence, which should be a Roman number between 1 and 
999).
Using macron or double macron is very confusive. Try representing 
888,888,888 with them, and you'll get something rendered like:
____________
____________ ___________
DCCCLXXXVIII DCCCLXXXVIII DCCCLXXXVIII
This notation is was invented after the first one, as it is even easier to 
read, and allows writing much larger numbers in a way quite similar to the 
modern thousand groups in the positional decimal system.
But to encode it more correctly, one should be able to encode directly the 
thousand multiplier (I note it with ° below):
DCCCLXXXVIII°DCCCLXXXVIII°DCCCLXXXVIII
It should be rendered as a macron applied about all previous roman numerals. 
Alternatively, if one wants to limit the backward string lookup for 
rendering, may be we could encode instead:
DCCCLXXXVIII°°DCCCLXXXVIII°DCCCLXXXVIII
(i.e. the longest string of base characters before the diacritic would be 
DCCCLXXXVIII, i.e. between 1 and 12 base characters).
Note that if we don't encode at all the thousand multiplier, then the value 
of the string would be ambiguous (although it would not be ambiguous in the 
example above).
For example look at: C°C°C (which represents 100,100,100): compare to CCC 
which represents 300.
The only current alternative, using the existing simple macrons in Unicode, 
is very hard to compose, unnecessarily lengthy and errorprone (Here I also 
use ° to denote this Unicode combining macron):
D°°C°°C°°C°°L°°X°°X°°X°°V°°I°°I°°I°°D°C°C°C°L°X°X°X°V°I°I°I°DCCCLXXXVIII
(This sort of string transformation should better be performed instead by 
the rendering engine, before font lookup)
Also this does not allow representing the multiplier as a stretched M above 
each thousand group.
Philippe.
----- Original Message ----- 
From: "Michael Everson" <everson@evertype.com>
To: "Unicode Discussion" <unicode@unicode.org>
Sent: Friday, October 28, 2005 5:17 PM
Subject: Re: Roman Numerals (was Re: Improper grounds for rejection of 
proposal N2677)
> At 19:00 +0400 2005-10-28, Andrew S wrote:
>>Michael Everson wrote:
>>>  You should use the regular Latin letters.
>>Why?
>
> Fine. Do what you want, if you don't want to take my advice.
> -- 
> Michael Everson * http://www.evertype.com
This archive was generated by hypermail 2.1.5 : Fri Oct 28 2005 - 12:23:23 CST