[Corpora-List] Comments on PDF Conversion

Tom Emerson tree at basistech.com
Tue Mar 28 21:25:23 UTC 2006


Ken Litkowski writes:
> I only have Acrobat reader, so I can't create in it.  But, it seems to 
> me that it should be like any other word processor where you can insert 
> things like footnotes, headers, figures, tables, etc.  With at least 
> WordPerfect (with its reveal codes), you can see that codes are used to 
> mark things up.  Musn't Adobe have something similar in Acrobat?

When you add information to a page in PDF you are adding information
at a given coordinate position on the page. While PDF has "structured
extensions" that creating apps can use, you cannot and should not rely
on these. Indeed, it is not uncommon to come across a PDF that doesn't
have any text at all in it: the pages are bitmaps. This is especially
true of PDF files that were created by scanning a document. Sometimes
OCR is performed to associate text with a the image, but this cannot
be relied on.

One vendor I haven't seen mentioned is PDFlib GmbH, though this is a
commercial solution so it may not be useful to you:

http://www.pdflib.com/index.htm

They choke on documents containing esoteric (e.g., Indic or other
complex scripts) content, but generally are pretty good, particularly
for English.

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)



More information about the Corpora mailing list