RE: PUA

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Oct 20 2003 - 07:56:59 CST


Chris Jacobs wrote:
> [...]
> Nevertheless I think if Unicode don't want to decide how the
> PUA is to be interpreted

Please take notice of this "interpreted": I'll come back to this soon.

> it should be at the very least provide a mechanism by which
> an user of the PUA can specify which specification he
> prefers.
>
> I plan to propose such a mechanism:
>
> I want to propose a char with the following properties:
>
> Scalar Value: U+E0002
>
> This starts a PUA interpretation

Again, please take notice of this "interpretation".

> selector tag. The content of the tag is a Font family
> name. For all PUA chars between this tag and the
> corresponding Cancel tag the copyright holder of the font
> is the sole authority about how the PUA should
> be interpreted.

Again, "interpreted"...

> Any comments?

Yes.

A font tells me how a certain run of text should be *displayed* in rich
text, not how it should be *interpreted* in plain text.

Imagine that I have been asked to write a function AreTheseLetters() which
gets a string argument (i.e., a piece of plain text) and returns a Boolean
value indicating whether all the characters in it are letters.

For non-PUA characters, I already implemented this using Unicode's "General
Category" property: I decided that all characters whose General Category is
"L*" are "letters". My default assumption about PUA characters is that they
are not letters.

So far so good. Now I want to use your PUA Plan-14 tags, if present, to
override the above assumption about PUA characters. E.g., imagine that my
string contains this:

        
        (U+0E0000 U+0E0002 U+0E0046 U+0E006F U+0E004F U+0E0062 U+0E0061
U+0E0072 U+0E002E U+0E0074 U+0E0074 U+0E0066 U+0E007F U+E017 U+E009)

This is what I am going to do:

1) I parsing the tags at the beginning of the string and save the relevant
information in a temporary variable which we will call PuaInterpretation;

2) I remove the tags.

Now, my PuaInterpretation variable contains the following information:

        Foobar.ttf

And my string contains the following text:

        
        (U+E017 U+E009)

Now, what's the next step? What am I supposed to do to find out whether,
according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009
are letters or not?

_ Marco



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST