An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Wed Aug 14 2002 - 09:04:50 EDT


Doug Ewell wrote as follows.

>Kenneth Whistler <kenw at sybase dot com> wrote:

[snipped]

>> These animals are more like U+FFFC -- they are internal anchors
>> that should not be exported, as there is no general expectation
>> that once exported to plain text, a receiver will have sufficient
>> context for making sense of them in the way the originator was
>> dealing with them internally.
>>

[snipped]

>This moves the entire issue out of the realm of poor support and into
>the big, dark, scary cavern of pre-deprecation.
>
>Unicode 3.0 doesn't say exactly what Ken says. Unicode 3.0 (p. 326)
>says the annotation characters should only be used under "prior
>agreement between the sender and the receiver because the content may be
>misinterpreted otherwise." Fine, no problem; those are the same rules
>that apply to the PUA. Ken, though, seems to say they shouldn't be
>exported at all, and furthermore they shouldn't even have been encoded
>in the first place, except that the noncharacters (which explicitly
>mustn't be interchanged) hadn't been invented yet.

It occurs to me that it is possible to introduce a convention, either as a
matter included in the Unicode specification, or as just a known about
thing, that if one has a plain text Unicode file with a file name that has
some particular extension (any ideas for something like .uof for Unicode
object file) that accompanies another plain text Unicode file which has a
file name extension such as .txt, or indeed other choices except .uof (or
whatever is chosen after discussion) then the convention could be that the
.uof file has on lines of text, in order, the name of the text file then the
names of the files which contains each object to which a U+FFFC character
provides the anchor.

For example, a file with a name such as story7.uof might have the following
lines of text as its contents.

story7.txt
horse.gif
dog.gif
painting.jpg

The file story7.uof could thus be used with a file named story.txt so as to
indicate which objects were intended to be used for three uses of U+FFFC in
the file story7.txt, in the order in which they are to be used.

I have used .gif and .jpg graphics files for my example, but the format
could be left open so that a Java class file or anything else could be used
as the object that is anchored within the document.

There is no obligation that the first part of the file name of the .uof file
and of the .txt file should be the same, yet that would typically be a
useful thing to do.

I can imagine that such a widely used practice might be helpful in bridging
the gap between being able to use a plain text file or maybe having to use
some expensive wordprocessing package.

I am not saying that this suggestion fully solves all of the possible
implications of rendering and so forth. I am simply suggesting that having
such a convention would be a useful facility. Such a convention, because it
uses a special file extension, would not intrude upon the right of anybody
to devise their own convention.

As this concerns the U+FFFC character and the Unicode Technical Committee is
due to meet next week, I think it might be helpful if this idea is discussed
before the meeting as a straightforward idea like this might mean that the
possibility to exchange U+FFFC characters at all if people want to do so is
not lost.

>Everybody will welcome the new conventional, graphical-type characters
>and scripts that are coming with Unicode 4.0.

What are those please?

William Overington

14 August 2002



This archive was generated by hypermail 2.1.2 : Wed Aug 14 2002 - 08:42:54 EDT