Re: The Cent & Florin Signs VS. C-Slash & Left-Tailed F

From: Richard Gillam (rgillam@jtcsv.com)
Date: Fri Jan 21 2000 - 14:52:13 EST


Peter Constable wrote:

> 5) leave U+0192 as is; add two new characters for florin and
> IAI bilabial f semantics
>
> pros: does not break existing implementations; provides
> unambiguous solution for future implementations; new characters
> have names and semantics appropriate to the two intended
> functions and are located in appropriate blocks; has precedent
> in treatment of hyphen-minus
> cons: inertia of existing implementations will lead to slow
> migration from use of U+0192 to use of new characters, and may
> never be entirely successful; additional work required in
> future implementations in handling (ambiguous) mapping from
> U+0192 in legacy data to new characters

I think it's worth it to point out that the inertia against migrating to a new
behavior will likely be the same with every option on the list except Option 1.
In other words, I don't see a good reason why font and text-analysis-engine
vendors will be any slower about migrating to the solution that introduces two
new characters then they will be in responding to the other choices. All of the
choices involve changes in behavior, glyph shape, or both. In the absence of a
strong customer need (read "lots of people yelling"), or a strong customer need
for some other feature of the same version of Unicode, no one's going to change
anything.

> A couple of ameliorating comments on the cons of option 5: Even
> if new implementations fail to abandon the use of U+0192 for
> florin in favour of a new, umambiguous florin character, there
> is still benefit gained from the addition of an unambiguous
> character for IAI bilabial f. Also, the work required in future
> implementations in handling the ambiguous mapping from U+0192
> to the new characters is not different in any significant way
> in the work required to handle the ambiguity for any existing
> legacy data in options 2, 3 and 4, and for all data in option
> 1.

The phrase "abandon the use of" scares me a little. You can't just start
ignoring U+0192 at some point-- it's a character forever. The only processes
that can abandon U+0192 (and the ones that should) are keyboard layouts and
input methods.

I also tend to disagree with all the talk about "handling the ambiguous mapping
from U+0192 to the new characters." If you encounter a U+0192 in the text, it
simply has ambiguous semantics, just as it does now. Applications can either
continue to handle it the way they are now, which maintains backward
compatibility, or they use some kind of heuristic to apply some semantic to the
character (which might work better, but may also break backward compatibility).
There is never anybody actually mapping from the old code to the appropriate new
code.

In particular, fonts continue to use whatever glyph they're using now for U+0192
and new fonts just pick the glyph for florin sign or f-hook as the one to
display for U+0192. A font engine should never be asked to try and guess which
character U+0192 "really" is.

In other words, in my opinion, existing applications, language-processing
packages, and fonts should continue to handle U+0192 exactly the way they do
now, and just add support for the new characters. The only processes that
change are keyboard mappings, to keep users from typing U+0192 in new documents
and allow them to type the new characters. Existing documents continue to
behave the way they do right now-- if a user finds that's wrong somewhere, he
manually goes back and fixes it himself (or runs a utility that does this
automatically).

[By the way, since it's now clear that the florin sign and the lower case f with
a hook do _not_, in fact, always look the same, I'm changing my vote from option
1 to option 5.]

--Rich Gillam
  IBM



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT