From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Oct 20 2003 - 07:56:59 CST
Chris Jacobs wrote:
> [...]
> Nevertheless I think if Unicode don't want to decide how the
> PUA is to be interpreted
Please take notice of this "interpreted": I'll come back to this soon.
> it should be at the very least provide a mechanism by which
> an user of the PUA can specify which specification he
> prefers.
>
> I plan to propose such a mechanism:
>
> I want to propose a char with the following properties:
>
> Scalar Value: U+E0002
>
> This starts a PUA interpretation
Again, please take notice of this "interpretation".
> selector tag. The content of the tag is a Font family
> name. For all PUA chars between this tag and the
> corresponding Cancel tag the copyright holder of the font
> is the sole authority about how the PUA should
> be interpreted.
Again, "interpreted"...
> Any comments?
Yes.
A font tells me how a certain run of text should be *displayed* in rich
text, not how it should be *interpreted* in plain text.
Imagine that I have been asked to write a function AreTheseLetters() which
gets a string argument (i.e., a piece of plain text) and returns a Boolean
value indicating whether all the characters in it are letters.
For non-PUA characters, I already implemented this using Unicode's "General
Category" property: I decided that all characters whose General Category is
"L*" are "letters". My default assumption about PUA characters is that they
are not letters.
So far so good. Now I want to use your PUA Plan-14 tags, if present, to
override the above assumption about PUA characters. E.g., imagine that my
string contains this:
(U+0E0000 U+0E0002 U+0E0046 U+0E006F U+0E004F U+0E0062 U+0E0061
U+0E0072 U+0E002E U+0E0074 U+0E0074 U+0E0066 U+0E007F U+E017 U+E009)
This is what I am going to do:
1) I parsing the tags at the beginning of the string and save the relevant
information in a temporary variable which we will call PuaInterpretation;
2) I remove the tags.
Now, my PuaInterpretation variable contains the following information:
Foobar.ttf
And my string contains the following text:
(U+E017 U+E009)
Now, what's the next step? What am I supposed to do to find out whether,
according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009
are letters or not?
_ Marco
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST