[Corpora-List] PDF Conversion

Ken Litkowski ken at clres.com
Tue Mar 28 15:35:03 UTC 2006


Is anyone aware of free software that will process PDF documents into 
text streams?  There is a PDF2HTML (with an XML option) that will create 
page-centric versions, but this does not really distinguish text from 
format.  I want to ignore (or be able to treat separately) such things 
as headers, footnotes, tables, figures, and equations.  (Note that even 
Google retains the page-centric view.)

Thanks,
	Ken
-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road
Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com



More information about the Corpora mailing list