[Corpora-List] PDF Conversion

radev at umich.edu radev at umich.edu
Tue Mar 28 19:15:35 UTC 2006


My student Alex C de Baca recommended this software:

http://www.foolabs.com/xpdf/index.html
http://www.bluem.net/downloads/pdftotext_en/


Ken Litkowski wrote:
> 
> Is anyone aware of free software that will process PDF documents into 
> text streams?  There is a PDF2HTML (with an XML option) that will create 
> page-centric versions, but this does not really distinguish text from 
> format.  I want to ignore (or be able to treat separately) such things 
> as headers, footnotes, tables, figures, and equations.  (Note that even 
> Google retains the page-centric view.)
> 
> Thanks,
> 	Ken
> -- 
> Ken Litkowski                     TEL.: 301-482-0237
> CL Research                       EMAIL: ken at clres.com
> 9208 Gue Road
> Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com
> 
> 
> 
> 
> 


-- 
Dragomir R. Radev                                         radev at umich.edu
Associate Professor of Information, Electrical Engineering and
Computer Science, and Linguistics, the University of Michigan, Ann Arbor
Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev



More information about the Corpora mailing list