[Corpora-List] PDF Conversion
Ken Litkowski
ken at clres.com
Tue Mar 28 15:35:03 UTC 2006
Is anyone aware of free software that will process PDF documents into
text streams? There is a PDF2HTML (with an XML option) that will create
page-centric versions, but this does not really distinguish text from
format. I want to ignore (or be able to treat separately) such things
as headers, footnotes, tables, figures, and equations. (Note that even
Google retains the page-centric view.)
Thanks,
Ken
--
Ken Litkowski TEL.: 301-482-0237
CL Research EMAIL: ken at clres.com
9208 Gue Road
Damascus, MD 20872-1025 USA Home Page: http://www.clres.com
More information about the Corpora
mailing list