[Corpora-List] PDF Conversion
radev at umich.edu
radev at umich.edu
Tue Mar 28 19:15:35 UTC 2006
My student Alex C de Baca recommended this software:
http://www.foolabs.com/xpdf/index.html
http://www.bluem.net/downloads/pdftotext_en/
Ken Litkowski wrote:
>
> Is anyone aware of free software that will process PDF documents into
> text streams? There is a PDF2HTML (with an XML option) that will create
> page-centric versions, but this does not really distinguish text from
> format. I want to ignore (or be able to treat separately) such things
> as headers, footnotes, tables, figures, and equations. (Note that even
> Google retains the page-centric view.)
>
> Thanks,
> Ken
> --
> Ken Litkowski TEL.: 301-482-0237
> CL Research EMAIL: ken at clres.com
> 9208 Gue Road
> Damascus, MD 20872-1025 USA Home Page: http://www.clres.com
>
>
>
>
>
--
Dragomir R. Radev radev at umich.edu
Associate Professor of Information, Electrical Engineering and
Computer Science, and Linguistics, the University of Michigan, Ann Arbor
Phone: 734-615-5225 Fax: 734-764-2475 http://www.si.umich.edu/~radev
More information about the Corpora
mailing list