Joined "ti" coded as "O" in PDF
hospes02 at scholarsfonts.net
Sat May 7 12:00:31 CDT 2016
I agree that it's a real-world problem -- PDFs really should be
searchable -- but I do not see that it's a Unicode issue. Unicode
defines the basic building blocks of LATIN SMALL LETTER T and LATIN
SMALL LETTER I; that's its job. Unicode is not responsible for font
construction or creating PDF software. Furthermore, even if Unicode did
want to do something about it, I can't imagine what that could be --
aside perhaps from using its bully pulpit to urge PDF creators and font
creators to do their jobs better.
The fact that some PDF apps do not search and copy/paste text correctly
when unencoded characters are given PUA values has been known for many
years. In the case of Calibri, I looked at the font (version installed
on my Win7 system) and found that the 'ti' ligature is named t_i, which
follows good naming practices, and it does not have a PUA assignment.
Given this, any well-constructed PDF app should be able to decode the
On 5/6/2016 11:49 AM, Steve Swales wrote:
> This discussion seems to have fizzled out, but I’m concerned that
> there’s a real world problem here which is at least partially the
> concern of the consortium, so let me stir the pot and see if there’s
> still any meat left.
> On the current release of MacOS (including the developer beta, for
> your reference, Peter), if you use Calibri font, for example, in any
> app (e.g. notes), to write words with “ti” (like
> internationalization), then press “Print" and “Open PDF in Preview”,
> you get a PDF document with the joined “ti”. Subsequently cutting and
> pasting produces mojibake, and searching the document for words
> with“ti” doesn’t work, as previously noted.
> I suppose we can look on this as purely a font handling/MacOS bug, but
> I’m wondering if we should be providing accommodations or conveniences
> in Unicode for it to work as desired.
More information about the Unicode