Re: Invalid code points

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Jun 04 2009 - 22:54:32 CDT

Next message: Damon Anderson: "Re: Fonts across platforms...."

Previous message: verdy_p: "Re: Invalid code points"
In reply to: verdy_p: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 6/4/2009 7:22 PM, verdy_p wrote:
>
> I also agree that the only useful interest that I see for U+FFFC is as a placeholder when it is needed for indicating the position where an external binary object is to be inserted...

So far so good.

The rest of your message mixes some good points with a bit of speculation.

You are correct that for an XML document, or any other "plain text
encoded" higher level protocol, one would not use U+FFFC, but use the
syntax constructs of that protocol.

You are also correct that the information "an object was inserted here"
is of limited use when a rich text file is converted to plain text.
There might be users who would wish to have this information, but most
systems don't insert a U+FFFC in that case.

You get into speculation where you try to imagine the possible, actual
uses for this character. It was encoded not primarily for data
interchange, but to solve a common implementation problem: inline images
and objects can be formatted like characters (for example, underlines
might be applied to them). By providing an actual *character* in the
text buffer, such text formatting can be kept regular (i.e. all
character styling applies to actual character offsets).

Most *binary* data interchange protocols are based more or less directly
on the in-memory representation of a rich text document. For that
reason, it is those protocols that are most likely to contain a U+FFFC
in the (text part) of the binary data stream.

I know, that is not intuitive, but that's what was encoded.

Later, much later, the UTC realized that there were other, similar needs
to have "internal-use" code points that are stripped out during
plain-text conversion. This has lead to the concept of noncharacters,
and the 34 existing, permanently reserved code points were augmented by
32 newly designated noncharacters, to give a set of 66 codes that can be
used for similar, internal-use placeholders.

The U+FFFC OBJECT REPLACEMENT CHARACTER was left as is - i.e. it's a
character, not a noncharacter - which makes its use in plain text
optional. You may use it to indicate where an object had to be stripped,
but many implementations choose not to.

A./

Next message: Damon Anderson: "Re: Fonts across platforms...."
Previous message: verdy_p: "Re: Invalid code points"
In reply to: verdy_p: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jun 04 2009 - 22:57:47 CDT