From: Pim Blokland (pblokland@planet.nl)
Date: Sun Mar 16 2003 - 08:18:53 EST
Chris Jacobs schreef:
> Mortbats code point 0034 is CANCER
> Arial Unicode MS code point 0034 is DIGIT FOUR
> Arial Unicode MS code point 264B is CANCER
No. First of all, this is the wrong example. This has got nothing to
do with private use characters. Cancer is not a private use
character!
I don't know the Mortbats font, but if this font has been designed
in accordance with the rules, it may have codepoint U+264B at index
#34. This should not cause problems or inconsistencies for the
display system.
Secondly, the problem with the PUA is that it should not, and will
not, be subjected to regulations and guidelines. Font designers are
always free to put anything they want in there - characters,
transcoding hints, combining accents, what have you. That is what
the PUA is there for!
However, let's take a look at what you really want.
Suppose we have two custom fonts, A and B, both with 256 (custom)
characters, and you want to free yourself of the problems caused by
any overlapping codepoints they may have.
Do you want to be able to tell the system that if you output
character U+E000, for example, it should use font A, and if you
output character U+E100, it should use font B?
What exactly is the use of this?
With a system like this, it would be impossible for, say, text files
or HTML files on the Internet to display characters like this.
Because what would you put in there to output, say, a Tinco? The
writer of the HTML file doesn't know at what codepoint offset you
have installed this Tengwar font.
A better approach would be to find a way to agree on the *names* for
the new characters.
A scenario could be envisioned where an XML file (or even HTML)
would contain the name of the font in a <FONT...> command; the
system would read this info, load the font and extract its name
table; and after this point, the file can contain entries like
"&Tinco;" which the system then can display, provided there is a
character named "Tinco" in the font, of course!
(Note: this may not be as straightforward as it sounds. For one
thing, the <FONT > tag has been deprecated. And the names of
characters in TrueType fonts are PostScript names, not HTML names,
so that a character like "periodcentered" should be addressed as
"·". But these are details, details...)
Pim Blokland
P.S.
This archive was generated by hypermail 2.1.5 : Sun Mar 16 2003 - 08:56:24 EST