Re: Hexadecimal digits

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jun 08 2010 - 13:51:39 CDT

  • Next message: John Dlugosz: "RE: Hexadecimal digits"

     "Mark Davis ☕" <mark@macchiato.com> wrote:
    > This topic is not particularly relevant to Unicode. Could people please
    > carry on this discussion on a different list? There are internet groups
    > devoted to hexadecimal and other topics (eg the adoption of Shavian by the
    > United Nations) where communities of like-minded people can be found.
    > On Tue, Jun 8, 2010 at 09:22, Luke-Jr <luke@dashjr.org> wrote:
    >
    > > On Tuesday 08 June 2010 10:53:15 am John Dlugosz wrote:
    > > > Yes, when discussing values in hex, this is an English problem. What do
    > > I
    > > > call the useful higher powers and groups? What is the equivalent of
    > > > "thousands" or "millions" to refer to powers of 65536 or 4294967296?
    > >
    > > Seriously, these questions are all answered in the book...
    > >
    > > (written using "classical" hexadecimal digits)
    > > 0=Noll 1=An 2=De 3=Te 4=Go 5=Su
    > > 6=By
    > > 7=Ra 8=Me 9=Ni A=Ko b=Hu C=Vy
    > > d=La
    > > E=Po F=Fy 10=Ton 100=San 1000=Mill 1,0000=Bong
    > > 1,0000,0000=Tam 1,0000,0000,0000=Song 1,0000,0000,0000,0000=Tran
    > > 2,8d5b,7E0F=Detam, memill - lasan - suton - hubong, ramill-posanfy

    This last message is certainly more on topic there, it discusses
    existing characters and their usage in some experimental (mostly
    written) language (don't know exactly which ones, may be just the
    language used by the initial creator of this system), and the related
    localization issues (which could also interest CLDR localizers), even
    if they are used by a very small minority. It also helps inderstanding
    what could be other issues related to other older numeric systems.

    And the 8 characters discussed here (for digits 8..15) are certainly
    good subjects for a possible proposal for encoding, even if they will
    certianly not fit in the BMP (they could easily fit in the SMP, and
    their character properties will certainly not be gc=Nd but gc=No). But
    I have no opinion if the 8 first digits (for numeric values 0..7)
    should also be reencoded.

    Also there's no problem in using characters with different gc in the
    same numeric system (after all this is already the case in the common
    [0-9a-fA-F]* notation where there are gc=Nd, gc=Ll, and gc=Lu, or with
    other indic or african scripts where they may also exist additonal
    digits with gc=No for fractions of unity).

    There's no extra character needed for the three positional powers of
    16 and the 4 positional powers of 16^4 used in the number names: this
    is not different from the case of powers of 1000 in the decimal
    positional system used in European languages, or the powers of 10000
    used in some Asian languages, but this is not a problem here for
    naming the characters).

    Note that the glyph used for one of those digits ressembles to digit 9
    (with which it is fully confusable), but it has a distinct numeric
    value (for this reason, it should be encoded separately, because of
    its distinct abstract identity).

    However I'm not sure about which script they should assigned to. For
    me this should be the same script property as existing digits 0..9 (of
    ASCII), with which they are used together in sequences or arbitrary
    order. May be they could be encoded as arbitrary hex digits, and the
    code positions U+1xxx0..U+1xxx7 should left free, and assigned only
    later if there are similar hexadecimal or octal systems and they can
    be unified for having the same abstract properties, and that should
    also be given gc=No and not gc=Nd, due to their specific usage). But
    here this would be a "political decision" (the glyph, even if it is
    not mandatory in ots exact form, is still part of the character
    identity, when there's limited possibility for variation and
    impossiblity to swap them, so other possible cadidate systems could
    easily choose to reuse the glyphs existing ASCII digits 8..9 with
    their current value, so that this would conflict with the assignment
    of these 8 characters for the "Ton-al" system)

    This discussion correctly describes what could be candidate names for
    the 8 candidate characters to encode as U+1xxx8..U+1xxxF, if this
    "Ton-al" system had to be supported (there may be some interest from
    some ISO member to do that for use in their public libraries, in their
    digitizing efforts). In fact this set is rather complete and well
    documented so that there's no real difficulties.

    The fact that this system did not have success (in its time) does not
    mean it is out of interest (after all, other extinct scripts were
    encoded, but because there's an active community using them at least
    for linguistic, archeologic or religious researches.) But here, it is
    not really need to help understand an old civilization, when the
    system has been created and explained in another modern language and
    culture that does not need it. But there may be interest for
    reproducing the books, publications and products displaying those 8
    characters.

    And recent inventions were also encoded as well (notably currency
    symbols, and soon there will be emojis), so age of this character is
    not so much a factor for the decision to encode them or not.

    Certainly there will not be a large support for fonts containing them
    or being updated only to include them given the very small usage, but
    small fonts could be easily and rapidly created containing only the 8
    common digits and the 8 supplementary digits, plus possibly some
    punctuations.

    Before that, it is easy to encode them with PUAs, and consign them in
    the CSUR prior to future adoption and encoding in the SMP (a font
    displaying them as PUAs should remain named/tagged as "Beta" or "PUA",
    this could be "Tonal Digits PUA") and replaced later by another
    similar font (with glyphs renumbered using SMP assignments, and a name
    matching the assigned block name), or in a font containing other
    standard subsets of numeric/maths characters, digits and symbols.

    Philippe.



    This archive was generated by hypermail 2.1.5 : Tue Jun 08 2010 - 13:54:29 CDT