Re: Question about U+FFFC

From: Mark Davis (mark@macchiato.com)
Date: Mon May 24 1999 - 12:33:16 EDT

Next message: Asmus Freytag: "Re: Question about U+FFFC"
Previous message: peter_constable@sil.org: "Re: Unicode corpus tools/missing characters"
Maybe in reply to: stephen_holmes@lionbridge.com: "Question about U+FFFC"
Next in thread: Asmus Freytag: "Re: Question about U+FFFC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The OBJECT REPLACEMENT CHARACTER is not made for markup. Instead, it is
intended for representing an object in memory with out-of-band information.
That means that the text is kept as a continuous list of characters, with
the style information associated with it by means of indices into the text.
For example, the marked-up contents:

The quick brown <img...> jumped...

might be represented with an out-of-band structure in memory as something
like the following:

String text = "The quick brown \uFFCC jumped..."
StyleRun[] styleruns =
[0]: start = 0, end = 9, posture = ITALIC
[1]: start = 4, end = 15, weight = BOLD
[2]: start = 16, end = 16, gif = {data}

The OBJECT REPLACEMENT CHARACTER provides a base for styles indicating a
graphic or other external object to be hung from. The content of the object
is represented in the associated stylerun.

Typically, changes to the text go through routines that alter both the
character stream and the indices for the styles. If as a result of such
changes, identical styles become adjacent, they are merged. When the range
for a style ends up selecting no text, it can have no effect and is removed.
Having a special character representing the graphic makes these style
modifications more uniform. In addition, by having a distinct character as
the base in the character stream, the software can flow it just like any
other character: the main difference is that instead of querying the font
system to find the metrics (bounds, origin, advance-width), it queries the
associated object (e.g. picture) for that information. Since it has a
distinct value (as opposed to just putting the style on an 'a', for
example), the text can also be searched without complication.

All of these operations are implemented more simply if there is a single,
distinct base character representing objects in the stream. Although I
believe this was originally proposed by Microsoft, very many text processing
architectures use something like this. For example, see
http://www.javasoft.com/products/jdk/1.2/docs/api/java/awt/font/GraphicAttri
bute.html.

Mark

----- Original Message -----
From: <stephen_holmes@lionbridge.com>
To: Unicode List <unicode@unicode.org>
Sent: Monday, May 24, 1999 04:09 AM
Subject: Question about U+FFFC

>
> Hi there,
>
> This character is defined in the standard as OBJECT REPLACEMENT CHARACTER,
used
> as I understand it, to facilitate the insertion of objects outside the
scope of
> a normal Unicode text stream. Would someone have an example of how this
is used
> (pseudo-code/algorithm)?
>
> i.e., Should you delimit the inserted object with FFFC and how is the
object
> data typically represented in the stream? For example, I'd like to
represent an
> inserted image file, would it look like
>
> \uFFFC C:\MyImages\Image.GIF \uFFCC
>
> Any clarification would be greatly appreciated.
>
>
>
>
> Thanks
> Steve.
>
>
> --------------------------------------
> Stephen Holmes, Engineering Manager
>
> Lionbridge Technologies, Grattan House
> Temple Road, Blackrock.
> Co. Dublin. IRELAND
>
> Tel: +353-1-283-6050 x 118
> Fax: +353-1-288-6220
> Web: http://www.lionbridge.com
> --------------------------------------
>
>
>
>

Next message: Asmus Freytag: "Re: Question about U+FFFC"
Previous message: peter_constable@sil.org: "Re: Unicode corpus tools/missing characters"
Maybe in reply to: stephen_holmes@lionbridge.com: "Question about U+FFFC"
Next in thread: Asmus Freytag: "Re: Question about U+FFFC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT