[Corpora-List] Comments on PDF Conversion
Tom Emerson
tree at basistech.com
Tue Mar 28 21:25:23 UTC 2006
Ken Litkowski writes:
> I only have Acrobat reader, so I can't create in it. But, it seems to
> me that it should be like any other word processor where you can insert
> things like footnotes, headers, figures, tables, etc. With at least
> WordPerfect (with its reveal codes), you can see that codes are used to
> mark things up. Musn't Adobe have something similar in Acrobat?
When you add information to a page in PDF you are adding information
at a given coordinate position on the page. While PDF has "structured
extensions" that creating apps can use, you cannot and should not rely
on these. Indeed, it is not uncommon to come across a PDF that doesn't
have any text at all in it: the pages are bitmaps. This is especially
true of PDF files that were created by scanning a document. Sometimes
OCR is performed to associate text with a the image, but this cannot
be relied on.
One vendor I haven't seen mentioned is PDFlib GmbH, though this is a
commercial solution so it may not be useful to you:
http://www.pdflib.com/index.htm
They choke on documents containing esoteric (e.g., Indic or other
complex scripts) content, but generally are pretty good, particularly
for English.
--
Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
"You can't fake quality any more than you can fake a good meal." (W.S.B.)
More information about the Corpora
mailing list