[Corpora-List] .doc to .txt converter

Florian Petran florian.petran at gmail.com
Fri Oct 26 13:34:15 UTC 2012


If you're looking to extract just the plain text from the .doc files,
antiword is probably suited best for that.

Windows binary:
http://www-stud.rbi.informatik.uni-frankfurt.de/~markus/antiword/antiword-0_37-windows.zip

On Fri, Oct 26, 2012 at 2:44 PM, Craig Pfeifer <craig.pfeifer at gmail.com> wrote:
> Depending on what is in the documents (whether they are just text,
> text + pictures, text with fancy layouts, etc.) the Apache Tika
> project may be just what you need:
>
> http://tika.apache.org/
> ______________
> craig.pfeifer at gmail.com
>
>
> On Fri, Oct 26, 2012 at 8:03 AM, Sara Berlanda
> <berlanda at uni-hildesheim.de> wrote:
>> Dear all,
>> can anybody advise me about a tool which can convert 3000 Word files (.doc) into 3000 .txt files at once? The tool should run on Windows 7 platform.
>> Thank you in advance,
>> Sara
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list