Re: Rendering Raised FULL STOP between Digits from Asmus Freytag on 2013-03-09 (Unicode Mail List Archive)

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Sat, 09 Mar 2013 11:30:55 -0800

Richard,

the situation with the raised decimal point is a mess in Unicode.

I know that Mark thinks we have too many dots, but the reason this case
is a mess is because the unification with U+002E is both non-workable in
practice and runs counter to precedent.

The precedent in Unicode is to separately encode characters when they
have different appearance, except, if, fundamentally, it's the "same"
character and the difference in appearance can be determined
unambiguously by "context".

There are two primary kinds of context that Unicode admits here. One is
based on surrounding text (such as positional forms of Arabic letters).
The other is overall stylistic context, such as a font choice (such as
upright vs. slanted integral symbols).

When the appearance of a character is different based on the author's
intent, and two (or more) different appearances can occur in the same
document with different significance, then the usual response by Unicode
has been to encode explicit characters. (The lot of phonetic characters
are full of examples for this, like the lower case a without hook or the
g with hook, both of which need to be distinguishable from other forms
of these letters in phonetics).

So, if a British document can use both inline dots and raised dots, then
you can't assign a single font to cover both. Well, the thought was,
software might recognize the numeric context. However, as you've pointed
out, section numbers are numeric and do not have the raised dot. In
fact, as far as such documents are concerned, the raised dot itself can
be used by the reader to distinguish decimal numbers from other use of
numbers separated by dots (something not possible in other languages
that lack this convention).

So, on the face of it, the choice to unify the raised decimal dot with
002E violates the encoding model, by pushing semantic distinctions into
some kind of markup. On top of that, it's not really practical to expect
to have to either mark up all decimal numbers or all section numbers
with separate styles or font bindings. That's something not required
anywhere else.

So far, that's bad enough.

Next, you have the issue that Unicode refused (quite properly) to encode
a generic "decimal separator" character, the appearance of which was
supposed to vary on external context (like locale or a document global
style). This suggestion had been intended to allow numerical expressions
to be cut and pasted between documents in different languages with all
numbers formatted correctly w/o further editing. That is, the same
character would appear as either comma or period (or raised period)
depending on context.

I wrote that I agreed with the choice to not code such special character
for that purpose. However, by not encoding a character for the raised
decimal point, Unicode did an about-face and made 002E a "limited
purpose" version of a "decimal separator". Suddenly, there is a
character that is supposed to have different appearance based on context
- on the line for US documents, off the line for British documents.

This directly violates the precedent established by the refusal to
encode the generic "decimal separator".

What can be done?

I believe the Unicode Standard should be fixed by explicitly removing
all suggestions in the text that the raised decimal point is unified
with 002E.

Second, the standard should be amended by identifying which character is
to be used instead for this purpose.

It might be something like 00B7. In that case, 00B7 would have to have
properties that effectively produce the correct result in numeric
context, while leaving non-numeric context unchanged. I believe that is
entirely possible, and non-disruptive, insofar as numeric use of 00B7
does not exist for any purpose other than showing a raised decimal point
(I suspect there are documents in the wild that already use this
character for that purpose).

If that alternative is deemed not acceptable, the only remaining choice
would be to add a new character. (I would recommend that only as the
last resort).

A./
Received on Sat Mar 09 2013 - 13:32:47 CST

This archive was generated by hypermail 2.2.0 : Sat Mar 09 2013 - 13:32:59 CST