From: Doug Ewell (dewell@adelphia.net)
Date: Wed Mar 19 2003 - 02:32:46 EST
Pim Blokland <pblokland at planet dot nl> wrote:
> Now what you do in the privacy of your own home is none of our
> concern, but when communicating with the outside world, there are
> certain rules and guidelines you should abide by. And one of those
> guidelines is a plaintext file should not have PUA characters in
> them, unless its author also specifies it should be displayed using
> a certain font.
Not exactly. There must be an agreement between sender and receiver
(author and reader, pitcher and catcher, whatever) as to how the PUA
characters are to be interpreted. This doesn't necessarily involve a
specific font, just one that follows that particular PUA interpretation.
For example, my invented script has been proposed for ConScript in the
range from U+E690 through U+E6CF. It's supported in Code2000 by James
Kass using this range. In the future, a font by Michael Everson will
also be available, using the same range, and then there will be (at
least) two fonts supporting the proposed ConScript range. At that time,
it will no longer be crucial whether users are using Code2000 or
Michael's font; both will display the same text, and the only
differences will be in aesthetics and style (as it should be with
fonts).
> Lastly, I must say I think it's a pity that the suggestion I made
> yesterday has been ignored so quietly. You know, in a HTML
> environment, to retrieve names for characters from the font file
> itself, to relieve the author from the task of having to enter
> numerical values.
> For an example, suppose you have a font named "Tengwar Quenya", with
> a character named "hwesta" at U+E00B, you could use it in an XML
> file by defining an entity, <!ENTITY hwesta "">. Now my
> suggestion was the browser program which displays this file should
> be able to look at the font information in the XML file, open the
> font file and retrieve the names of all characters in it, so it can
> show the "&hwesta;" character (and all other characters) without
> needing a long list of ENTITY entries in the XML.
There have been lots of attempts to define short mnemonic names or
"entities" for Unicode. SGML names are one. The "i18nrep
repertoiremap," originally defined in RFC 1345 and more recently used in
ISO/IEC TR 14652, is another. These schemes work well for a relatively
small number of characters, say a thousand, but become unwieldy and
anti-mnemonic when applied to a larger set of characters. There simply
aren't enough short mnemonic names to go around.
It's possible that the name "hwesta" might catch on for this particular
Tengwar letter, and then the scenario Pim describes might work (although
asking a browser to interpret the internal structure of a font file
seems excessive to me). But the same mechanism is less likely to work
on other scripts, where character names are less likely to be easily,
uniquely abbreviated (e.g. many scripts have a character called KA or
VIRAMA).
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Wed Mar 19 2003 - 03:22:22 EST