[Corpora-List] PDF Conversion
Hamish Cunningham
hamish at dcs.shef.ac.uk
Tue Mar 28 15:50:16 UTC 2006
Ted Briscoe's group in Cambridge have a PDF converter - you might contact
them
Best
Hamish
Tom Emerson wrote:
> Ken Litkowski writes:
>
>>Is anyone aware of free software that will process PDF documents into
>>text streams? There is a PDF2HTML (with an XML option) that will create
>>page-centric versions, but this does not really distinguish text from
>>format. I want to ignore (or be able to treat separately) such things
>>as headers, footnotes, tables, figures, and equations. (Note that even
>>Google retains the page-centric view.)
>
>
> Given that PDF is a page-centric format, so you are unlikely to find
> something that does what you are looking for: headers, footnotes,
> tables, etc. are not going to be flagged from the surrounding content
> in any special way.
>
--
Hamish
http://www.dcs.shef.ac.uk/~hamish/
More information about the Corpora
mailing list