Encoding of old compatibility characters
jr at qsm.co.il
Mon Mar 27 17:43:17 CDT 2017
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Fr?d?ric Grosshans
Sent: Tuesday, March 28, 2017 1:05 AM
Subject: Re: Encoding of old compatibility characters
Another example, about to be encoded, it the GOUP MARK, used on old IBM computers (proposal: ML threads:
http://www.unicode.org/mail-arch/unicode-ml/y2015-m01/0040.html , and http://unicode.org/mail-arch/unicode-ml/y2007-m05/0367.html )
Le 27/03/2017 à 23:46, Frédéric Grosshans a écrit :
> An example of a legacy character successfully encoded recently is ⏨
> U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
> It came from the Soviet standard GOST 10859-64 and the German standard
> ALCOR. And was proposed by Leo Broukhis in this proposal
> http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a
> discussion on this mailing list here
> http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where
> Ken Whistler was already sceptical about the usefulness of this encoding.
> Le 27/03/2017 à 16:44, Charlotte Buff a écrit :
>> I’ve recently developed an interest in old legacy text encodings and
>> noticed that there are various characters in several sets that don’t
>> have a Unicode equivalent. I had already started research into these
>> encodings to eventually prepare a proposal until I realised I should
>> probably ask on the mailing list first whether it is likely the UTC
>> will be interested in those characters before I waste my time on a
>> project that won’t achieve anything in the end.
>> The character sets in question are ATASCII, PETSCII, the ZX80 set,
>> the Atari ST set, and the TI calculator sets. So far I’ve only
>> analyzed the ZX80 set in great detail, revealing 32 characters not in
>> the UCS. Most characters are pseudo-graphics, simple pictographs or
>> inverted variants of other characters.
>> Now, one of Unicode’s declared goals is to enable round-trip
>> compatibility with legacy encodings. We’ve accumulated a lot of weird
>> stuff over the years in the pursuit of this goal. So it would be
>> natural to assume that the unencoded characters from the mentioned
>> sets would also be eligible for inclusion in the UCS. On the other
>> hand, those encodings are for the most part older than Unicode and so
>> far there seems to have been little interest in them from the UTC or
>> WG2, or any of their contributors. Something tells me that if these
>> character sets were important enough to consider for inclusion, they
>> would have been encoded a long time ago along with all the other
>> stuff in Block Elements, Box Drawings, Miscellaneous Symbols etc.
>> Obviously the character sets in question don’t receive much use
>> nowadays (and some weren’t even that relevant in their time, either),
>> which leads to me wonder whether further putting work into this
>> proposal would be worth it.
More information about the Unicode