[Corpora-List] converting PDFs to ASCII or text-only files without clumps
Christian Chiarcos
christian.chiarcos at web.de
Wed Jun 16 12:21:28 UTC 2010
Sorry for the confusion, the *more* in my mail was an artifact. No
comparison with Tika intended. It referred to the original first line of
my mail that mentioned ps2ascii, but I've removed this line because
ps2ascii is not really an option, neither for special characters nor for
the clumps problem.
Christian
> *Comment off list*
>
> FYI : Tika provides a XHTML representation of the input. Just for my own
> interest, could you explain why you think it is a more suitable option?
>
> Thanks
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list