From: Eric Muller (emuller@adobe.com)
Date: Sat Feb 09 2008 - 19:26:03 CST
James Kass wrote:
>
> Now, I don't know where those extra spaces are coming from, but I bet
> they make searching difficult.
Short answer:
Acrobat (Pro and Reader) is attempting to reconstruct correctly the text
even in adversarial conditions. The spaces are the result of attempts at
obtaining the best results across a wide range of PDF documents.
Slightly longer answer:
In many cases, PDF generation is hooked at fairly late stage of the
pipeline that goes from the user input to a printed image. For an input
like "the car" you can end up with PDF content of the form (using a
pseudo notation):
(the car) showstring
or
(the) showstring 50 advance (car) showstring
To accommodate the later case, Acrobat needs to generate a space
character when there is no space glyph. Because there are many
complications of the same nature, the conditions under which to generate
a space character are non trivial, and most likely involve some
compromises. Furthermore, it is quite likely that the class of PDFs
corresponding to Indic texts was not considered when determining those
conditions.
May be the conditions which are actually coded in Acrobat can be refined
to work better for Indic texts, may be there are inherent conflicts with
other PDFs (I just don't know).
Eric.
This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 19:29:04 CST