[Corpora-List] converting PDFs to ASCII or text-only files without clumps

Christian Chiarcos christian.chiarcos at web.de
Wed Jun 16 12:21:28 UTC 2010


Sorry for the confusion, the *more* in my mail was an artifact. No  
comparison with Tika intended. It referred to the original first line of  
my mail that mentioned ps2ascii, but I've removed this line because  
ps2ascii is not really an option, neither for special characters nor for  
the clumps problem.

Christian

> *Comment off list*
>
> FYI : Tika provides a XHTML representation of the input. Just for my own
> interest, could you explain why you think it is a more suitable option?
>
> Thanks

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list