Invalid char display (was: Using hex numbers considered a geek attitude)

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Apr 27 2001 - 13:40:41 EDT


Karl,

There is a character set missing from Unicode. Unicode needs a special hex
display font.

When I worked on Fujitsu systems they had a special way of displaying
characters that were not in the font. They had an extended DBCS system
called JEF and when you attempted to display a character that was not in the
font the hex value was displayed:

                        XX
                        XX

The hex digits were half size so that they would display in the same size as
a kanji character. Although you could not see the character, you could send
a screen shot to a tech and get real help. You could also look the
character up in a book to see what the character was if it was really
important.

I found that the use of hex rather than decimal was a plus. Most
importantly the character could be displayed with 4 hex digits. In decimal
it would have taken 5 digits which would not have fit very well. Secondly
is that the hex representation is only a tag for the character. It could
have been "G13Z" and it would not have made any difference. The only
challenge was that the user at times might have to look up a character. It
is not difficult to explain that as long as you assume that numbers sort
before alpha characters.

The real benefit to hex is that it better maps the actual binary layout of
Unicode. In most cases the user will have no knowledge of the Unicode
encoding. In fact they probably will not know that it is Unicode at all.
If Unicode were documented in decimal then the implementers who really are
concerned would have to be constantly converting the numbers to hex to use
them.

I think that they learned from the ASCII charts that use decimal numbers to
express the code points. They are extremely frustrating to use.

It would be even worse to list all the code points in both hex and decimal.
It would create a lot of confusion. I just wish that the w3c folks have not
introduced decimal character encoding in HTML. Nobody but geeks look at the
HTML source.

Carl

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Karl Pentzlin
Sent: Friday, April 27, 2001 4:04 AM
To: unicode@unicode.org
Subject: Using hex numbers cosidered a geek attitude (was; Re: Decimal
Unicodepoints)

Am Mittwoch, 25. April 2001 um 06:04 schrieb 11digitboy@bolt.com:

1bc> Why don't you make the next print edition of the Unicode
1bc> standard (not to mention online) with Unicodepoints
1bc> in decimal as well as hex?

In fact, I do not see any reason to use hex numbers in documents
released for the general public. In my opinion, future print editions
should use decimal numbers *primarily*.

Of course, hex numbers are a concept which all computer and coding
experts know as good as ordinary (i.e. decimal) numbers. They have
advantages if you have to discuss technical details of communications
on the bit or octet stream level - but I cannot see any other
advantage. They are legacy. They are a habit of the experts inherited
from the time when 8-bit or 16-bit entities were strong constraints
for code tables - now, the possible number of Unicode points is not
even a power of 2.

On the other hand, "numbers" (i.e. decimal numbers) are a concept
everybody is familiar with. Thus, why not say to the public simply,
"Unicode gives every character a number", instead of geek speak like
"Unicode gives every character a code point, and as we are very cute,
we use a special numbering system with 16 digits designed for
computer experts, and to use Unicode you have to become an expert too
and have to learn this system"?

If "character numbers" are at last used commonly, e.g. every Chinese
businessman can spell his name to every secretary in the world, simply
telling them the numbers. And every secretary can enter the name
correctly without any knowledge to Chinese, having no more to learn
than the single function key to be depressed during the digit input
(which may be standardized for future keyboards). And (maybe the
strongest argument) they have no chance to confuse decimal numbers with
hex numbers which accidentally do not have digits in the A..F range.

The fact that these "numbers" have a relevance for the various UTF
encodings is convenient for computer and coding experts. But for
the ordinary user, it has no more relevance as knowing the cylinder
head diameter for driving a car.

--
Karl Pentzlin
AC&S Analysis Consulting & Software GmbH
München, Germany
mailto:karl-pentzlin@acssoft.de



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT