Re: Roman Numerals (was Re: Improper grounds for rejection of proposal N2677)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Oct 28 2005 - 12:21:09 CST

  • Next message: Philippe Verdy: "Re: Roman Numerals (was Re: Improper grounds for rejection of proposal N2677)"

    Regarding Roman numerals, there are still missing combining numerals to form
    the large numbers, i.e. the combining C on the right, and the combining
    turned C on the left. These should combine with a central I. Alternatively,
    the combining C on the right could be a combining C and reversed C, added
    after the central I.

    The existing CD thousand numeral is in fact a ligature of a central I and
    the two symbols. A better (less confusive) name should have been CID (where
    the D represents the reversed C, and is not confusive because the roman
    numeral D cannot immediately follow the roman numeral I except when used in
    combination after a leading C), rather than CD which means 400.

    If one prefers, we could avoid encoding the central I, by using the existing
    CD thousand numeral for meaning 1000, and adding combining numerals after it
    (or after D meaning 500) to multiply its value by 10.

    So the multiplicator by 10 could be a unique combining character: it will
    have the form of a half circle combining on the right if it follows the
    roman D numeral base character, and the form of a surrounding full circle if
    it follows the roman CD-thousand numeral base character.

    For now, it is impossible to represent correctly and consistently the Roman
    numbers 5,000 and 10,000 (made with a double left half-circles or double
    circles), 50,000 and 100,000 (made with a triple left half-circles or triple
    circles)...

    The only approximate alternative is to not use the existing Roman numerals
    at all, and revert to Latin letters, and then use C, I, and OPEN O (which
    looks quite similar to the turned C, except that the serif on is missing on
    the bottom leg, when drawn with serif fonts), or to replace the sequence
    <I,TURNED C> by <D>, and possibly add joiner controls between them to
    request (and may be force) their ligature.

    So to represent 888,888, you have to write the following sequences with
    Latin letters instead or Roman numerals (I add spaces between what should be
    combining sequences to make the number easier to read, but these spaces
    should not be present, and use D after I instead of a combining TURNED-C
    after I):

    IDDD CCIDD CCIDD CCIDD = 800,000
    IDD CID CID CID = 80,000
    D M M M = 8,000
    D C C C = 800
    L X X X = 80
    V I I I = 8

    This results in the compact string:
    IDDDCCIDDCCIDDCCIDDIDDCIDCIDCIDDMMMDCCCLXXXVIII
    which would be much easier to read if it actually used the ligatures of
    combining sequences.

    --------

    Another thing that is missing is the representation of thousand multiples:
    it can be either a combining M, stretched above the complete sequence that
    it multiplies, or a combining macron that is also stretched over the
    complete sequence it multiplies. (Note that there can be several multipliers
    stacked above the sequence, which should be a Roman number between 1 and
    999).

    Using macron or double macron is very confusive. Try representing
    888,888,888 with them, and you'll get something rendered like:

    ____________
    ____________ ___________
    DCCCLXXXVIII DCCCLXXXVIII DCCCLXXXVIII

    This notation is was invented after the first one, as it is even easier to
    read, and allows writing much larger numbers in a way quite similar to the
    modern thousand groups in the positional decimal system.

    But to encode it more correctly, one should be able to encode directly the
    thousand multiplier (I note it with ° below):
    DCCCLXXXVIII°DCCCLXXXVIII°DCCCLXXXVIII
    It should be rendered as a macron applied about all previous roman numerals.
    Alternatively, if one wants to limit the backward string lookup for
    rendering, may be we could encode instead:
    DCCCLXXXVIII°°DCCCLXXXVIII°DCCCLXXXVIII
    (i.e. the longest string of base characters before the diacritic would be
    DCCCLXXXVIII, i.e. between 1 and 12 base characters).

    Note that if we don't encode at all the thousand multiplier, then the value
    of the string would be ambiguous (although it would not be ambiguous in the
    example above).
    For example look at: C°C°C (which represents 100,100,100): compare to CCC
    which represents 300.

    The only current alternative, using the existing simple macrons in Unicode,
    is very hard to compose, unnecessarily lengthy and errorprone (Here I also
    use ° to denote this Unicode combining macron):

    D°°C°°C°°C°°L°°X°°X°°X°°V°°I°°I°°I°°D°C°C°C°L°X°X°X°V°I°I°I°DCCCLXXXVIII

    (This sort of string transformation should better be performed instead by
    the rendering engine, before font lookup)
    Also this does not allow representing the multiplier as a stretched M above
    each thousand group.

    Philippe.

    ----- Original Message -----
    From: "Michael Everson" <everson@evertype.com>
    To: "Unicode Discussion" <unicode@unicode.org>
    Sent: Friday, October 28, 2005 5:17 PM
    Subject: Re: Roman Numerals (was Re: Improper grounds for rejection of
    proposal N2677)

    > At 19:00 +0400 2005-10-28, Andrew S wrote:
    >>Michael Everson wrote:
    >>> You should use the regular Latin letters.
    >>Why?
    >
    > Fine. Do what you want, if you don't want to take my advice.
    > --
    > Michael Everson * http://www.evertype.com



    This archive was generated by hypermail 2.1.5 : Fri Oct 28 2005 - 12:23:23 CST