From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 19 2003 - 15:27:50 EST
From: "Peter Constable" <petercon@microsoft.com>
> A software product could assign every single PUA codepoint to mean some
> kind of formatting instruction, and insert these into the text like
> markup. In that case, a user's PUA characters will be re-interpreted by
> that software as formatting instructions. Is that product conformant?
> Yes. Is it useful? Not for that user.
With a very simple transcoder, you could remap all HTML markup and
supplementary end of lines used in markup into 256 PUAs. You would get a
file that contains ALL the HTML markup but still complies to the Unicode
plain-text definition. Rendering it back to HTML would use a reverse filter,
and would create a HTML file without any PUA, so it would be rendered
correctly.
The only problem is that PUAs have no defined rendering, and Unicode does
not specify ranges of PUAs for distinct uses, with distinct but predefined
_default_ character properties:
why isn't there a range for Mn diacritics, a range for ideographic letters
or symbols, and a range for ignorable formatting controls (all of them with
combining class 0). At least it would have allowed applications and renderer
to behave correctly even in the absence of support for those PUAs, by using
a correct _default_ rendering, instead of just displaying narrow white
boxes, or nothing...
I don't know why this would break anything: documents can still use PUAs the
way they want with their own semantic and behavior. But suggesting distinct
ranges for the default behavior would be a real bonus to help applications
adopt a coherent behavior face to unknown or unspecified PUAs.
This archive was generated by hypermail 2.1.5 : Wed Nov 19 2003 - 16:21:39 EST