a character for an unknown character
richard.wordingham at ntlworld.com
Fri Dec 30 06:37:27 CST 2016
On Fri, 30 Dec 2016 01:23:55 +0100 (CET)
Marcel Schneider <charupdate at orange.fr> wrote:
> On Wed, 28 Dec 2016 19:05:17 -0800, Asmus Freytag wrote:
> > On 12/28/2016 5:47 PM, Richard Wordingham wrote:
> U+02BC being shifted from a letter to a punctuation must have been
> anticipated at encoding, since the original recommendation was to use
> it as apostrophe throughout. Unifying the letter apostrophe and the
> punctuation apostrophe made IMO more sense—despite of the conflicting
What conflicts? Both prototypically mark absences.
The rationale seems to be that English uses both the punctuation
apostrophe and the U+2019 RIGHT SINGLE QUOTATION MARK. If users aren't
being trained to use U+2212 MINUS SIGN, and habitually disable grammar
and spell-checking, most won't make the right choice between U+02BC and
> Perhaps the letters for hexadecimal digits should have been encoded
The idea has been rejected several times.
> > > 5) The nightmare of spacing single and double dots.
> > ? spacing vs. combining? Not sure what you mean.
> I think Richard refers to U+2024 ONE DOT LEADER and U+2025 TWO DOT
> LEADER, along with U+002E FULL STOP.
That's not the half of it. For starters, just look at the confusables
for U+00B7 MIDDLE DOT:
U+2027 HYPHENATION POINT
U+2219 BULLET OPERATOR
U+22C5 DOT OPERATOR
U+2E31 WORD SEPARATOR MIDDLE DOT
U+30FB KATAKANA MIDDLE DOT
There's an argument that the unification of U+00B7 and U+0387 ANO
TELEIA is a unification too far. A font for Greek may need to work out
which it is to position it correctly.
For double dots, there're the confusables for U+003A COLON:
U+05C3 HEBREW PUNCTUATION SOF PASUQ
There's a whole raft of visargas, some of which match and some of
which don't. What happened to the principle that diacritics are unified
by form? I suspect the answer is that encoding was established while
principles were still developing.
> > > As a result, I have no idea whether the singular of "fithp" (one
> > > of Larry Niven's alien species) should be spelt with U+02BC or
> > > U+2019, though in ASCII I can just write "fi'".
> Normally on an English or French keyboard layout, all three are
> accessed on live keys.
That accessibility is news to me - normally I just have to fight a word
processor if I want U+0027. However, I still don't know whether to
spell the word «fiʼ» or «fi’». I've only seen it in print.
More information about the Unicode