From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jun 08 2010 - 13:51:39 CDT
"Mark Davis ☕" <mark@macchiato.com> wrote:
> This topic is not particularly relevant to Unicode. Could people please
> carry on this discussion on a different list? There are internet groups
> devoted to hexadecimal and other topics (eg the adoption of Shavian by the
> United Nations) where communities of like-minded people can be found.
> On Tue, Jun 8, 2010 at 09:22, Luke-Jr <luke@dashjr.org> wrote:
>
> > On Tuesday 08 June 2010 10:53:15 am John Dlugosz wrote:
> > > Yes, when discussing values in hex, this is an English problem. What do
> > I
> > > call the useful higher powers and groups? What is the equivalent of
> > > "thousands" or "millions" to refer to powers of 65536 or 4294967296?
> >
> > Seriously, these questions are all answered in the book...
> >
> > (written using "classical" hexadecimal digits)
> > 0=Noll 1=An 2=De 3=Te 4=Go 5=Su
> > 6=By
> > 7=Ra 8=Me 9=Ni A=Ko b=Hu C=Vy
> > d=La
> > E=Po F=Fy 10=Ton 100=San 1000=Mill 1,0000=Bong
> > 1,0000,0000=Tam 1,0000,0000,0000=Song 1,0000,0000,0000,0000=Tran
> > 2,8d5b,7E0F=Detam, memill - lasan - suton - hubong, ramill-posanfy
This last message is certainly more on topic there, it discusses
existing characters and their usage in some experimental (mostly
written) language (don't know exactly which ones, may be just the
language used by the initial creator of this system), and the related
localization issues (which could also interest CLDR localizers), even
if they are used by a very small minority. It also helps inderstanding
what could be other issues related to other older numeric systems.
And the 8 characters discussed here (for digits 8..15) are certainly
good subjects for a possible proposal for encoding, even if they will
certianly not fit in the BMP (they could easily fit in the SMP, and
their character properties will certainly not be gc=Nd but gc=No). But
I have no opinion if the 8 first digits (for numeric values 0..7)
should also be reencoded.
Also there's no problem in using characters with different gc in the
same numeric system (after all this is already the case in the common
[0-9a-fA-F]* notation where there are gc=Nd, gc=Ll, and gc=Lu, or with
other indic or african scripts where they may also exist additonal
digits with gc=No for fractions of unity).
There's no extra character needed for the three positional powers of
16 and the 4 positional powers of 16^4 used in the number names: this
is not different from the case of powers of 1000 in the decimal
positional system used in European languages, or the powers of 10000
used in some Asian languages, but this is not a problem here for
naming the characters).
Note that the glyph used for one of those digits ressembles to digit 9
(with which it is fully confusable), but it has a distinct numeric
value (for this reason, it should be encoded separately, because of
its distinct abstract identity).
However I'm not sure about which script they should assigned to. For
me this should be the same script property as existing digits 0..9 (of
ASCII), with which they are used together in sequences or arbitrary
order. May be they could be encoded as arbitrary hex digits, and the
code positions U+1xxx0..U+1xxx7 should left free, and assigned only
later if there are similar hexadecimal or octal systems and they can
be unified for having the same abstract properties, and that should
also be given gc=No and not gc=Nd, due to their specific usage). But
here this would be a "political decision" (the glyph, even if it is
not mandatory in ots exact form, is still part of the character
identity, when there's limited possibility for variation and
impossiblity to swap them, so other possible cadidate systems could
easily choose to reuse the glyphs existing ASCII digits 8..9 with
their current value, so that this would conflict with the assignment
of these 8 characters for the "Ton-al" system)
This discussion correctly describes what could be candidate names for
the 8 candidate characters to encode as U+1xxx8..U+1xxxF, if this
"Ton-al" system had to be supported (there may be some interest from
some ISO member to do that for use in their public libraries, in their
digitizing efforts). In fact this set is rather complete and well
documented so that there's no real difficulties.
The fact that this system did not have success (in its time) does not
mean it is out of interest (after all, other extinct scripts were
encoded, but because there's an active community using them at least
for linguistic, archeologic or religious researches.) But here, it is
not really need to help understand an old civilization, when the
system has been created and explained in another modern language and
culture that does not need it. But there may be interest for
reproducing the books, publications and products displaying those 8
characters.
And recent inventions were also encoded as well (notably currency
symbols, and soon there will be emojis), so age of this character is
not so much a factor for the decision to encode them or not.
Certainly there will not be a large support for fonts containing them
or being updated only to include them given the very small usage, but
small fonts could be easily and rapidly created containing only the 8
common digits and the 8 supplementary digits, plus possibly some
punctuations.
Before that, it is easy to encode them with PUAs, and consign them in
the CSUR prior to future adoption and encoding in the SMP (a font
displaying them as PUAs should remain named/tagged as "Beta" or "PUA",
this could be "Tonal Digits PUA") and replaced later by another
similar font (with glyphs renumbered using SMP assignments, and a name
matching the assigned block name), or in a font containing other
standard subsets of numeric/maths characters, digits and symbols.
Philippe.
This archive was generated by hypermail 2.1.5 : Tue Jun 08 2010 - 13:54:29 CDT