[Corpora-List] converting PDFs to ASCII or text-only files without clumps

Trevor Jenkins trevor.jenkins at suneidesis.com
Wed Jun 16 15:25:22 UTC 2010


On Wed, 16 Jun 2010, Richard Hussey <richardhussey42 at gmail.com> wrote:

> On 16 June 2010 15:44, Trevor Jenkins <trevor.jenkins at suneidesis.com> wrote:
> > Is that a viable solution? John McKenny described his source data as 30
> > years of journal papers. To feed the accumulated files through Reader
> > would be tedious to the say the least.
>
> I did about 400 papers myself. ...

"You're a better man that I am Gunga Din". I'd have given up on 40.

> ...  It was very dull...

Nah, truthfully I've have given up at 4. Would have been enough for "proof
of concept" and then I would have ...

> I would advise another solution if it can be found, an automatic one
> would be best.  ;)

... looked for that automatic method amendable to being run as a batch
process. Probably going for pdf2text because of already having GhostScript
installed on all my workstations. Though maybe PDF2Text seemingly being
part of Mac OS X I would have given a cursory glance at that.

Regards, Trevor

<>< Re: deemed!


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list