[Corpora-List] PDF Conversion
Alexander Osherenko
osherenko at gmx.de
Tue Mar 28 15:57:26 UTC 2006
Hi Ken,
I worked with the PDF2HTML tool and my experience is that although it is
a free software you still pay by losing your time and temper :) - the
tool produces vague and not very exact results (wrong order of HTML tags
or footnotes, wrong HTML tags e.g. <b><i><\b><\i> to name one).
Nevertheless after you finished your first experiments with the tool you
may find that you are a really mighty expert in PDF, HTML, PDF2HTML
whatsoever and the tool is actually not so bad...
Sorry if my answer is something confusing but I hope it helps.
Cheers
Alexander
Ken Litkowski schrieb:
> Is anyone aware of free software that will process PDF documents into
> text streams? There is a PDF2HTML (with an XML option) that will
> create page-centric versions, but this does not really distinguish
> text from format. I want to ignore (or be able to treat separately)
> such things as headers, footnotes, tables, figures, and equations.
> (Note that even Google retains the page-centric view.)
>
> Thanks,
> Ken
More information about the Corpora
mailing list