On 29/09/2014 11:02 PM, "Frédéric Grosshans" <frederic.grosshans_at_gmail.com>
wrote:
>
> Le 27/09/2014 01:10, Andrew Cunningham a écrit :
>
>> * NEVER try to copy and paste text from PDF. It is a preprint format and
should be treated as such.
>
> Well... Having access to the raw text is often useful (for example, to
allow blinds to have acces to the content of pdf documents, or to search a
word in a scanned historical document), and cut and pasting text from PDF
often works, even if the “rich text†formating is lost.
>
The problem is that often the actual text isnt necessarily ths same as the
original text used to generate the pdf.
Results will vary according to fonts used and tools used to generate the
pdf. Even adobe acrobat contains different tools which can give vastly
different results.
It is best to think of PDF as dealing with glyphs rather than characters.
I tend to mainly work with complex scripts, and the results with those is
usually not encouraging. I know there is ActualText, but honestly I dont
actually ever remember seeing a complex script PDF I could copy and paste
from without post-processing of the text.
The average person creating PDF files has no knowledge of how to achieve
optimal results.
Nko is one of the easier scripts to deal with thankfully.
> In the case of the Ebola FAQs (
https://sites.google.com/site/athinkra/ebola-faqs) discussed here, it
almost worked perfectly on my computer (Ubuntu Linux 14.04) for N’Ko
(diacritics are shifted by one character) and Vai. Of course, the Adlam was
not working (somehow converted to Arabic), bus it was expected, since Adlam
is not (yet?) in Unicode.
>
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Mon Sep 29 2014 - 15:04:55 CDT
This archive was generated by hypermail 2.2.0 : Mon Sep 29 2014 - 15:04:56 CDT