John Cowan wrote:
>
> > <IMPORTANT NOTE:> This file uses characters assigned
> > to the Private Use Area of Unicode according to the
> > PUA scheme published at (URL). In order to view this
> > document, it will be necessary to obtain and install
> > the (font-name) font from (URL of font provider).
> > <end IMPORTANT NOTE>
>
> Well, this is fine if all you want to do is render the document.
> If you want to *process* the document, though, you need
> need to have information on the properties of the PUA
> characters relevant to the document.
>
> However, I agree that no new representations are needed
> for this. It is sufficient just to extend the 3.x
> UnicodeData and *Properties files.
>
The ability to correctly display text is important.
Anything beyond that would perhaps be better stored as
part of the PUA scheme itself at the referenced URL.
(This could be in plain text format designed to be used
to extend the UnicodeData files.)
Or, in the case of TTF/OTF, there's a table within the font
(GDEF = glyph definition) which allows some rudimentary
properties for glyphs. (This font table isn't yet widely
supported.)
To store all such information in each relevant file using
non-BMP characters does seem a bit much. Even without
any new representations, providing this data in each file
might work if the user had only one or two such files,
but wouldn't most users favoring a PUA encoding have
many files?
Earlier, someone brought up the idea that the format of
the tag could include an active link to download additional
data. If the tag must be in each file's header, what happens
if a user is looking at files off-line? Does the system read
the header of the file, determine that data is required on-line,
and then prompt the user to connect? Every time that file
or a similar file is opened?
Maybe it would be best to leave it incumbent upon a file's
author to provide any necessary information or pointers.
If someone has accessed a file which uses the PUA and can't
read it, it may well be that the contents of that file are
supposed to be every bit as private as the Unicode area used.
Best regards,
James Kass.
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT