[Corpora-List] .doc to .txt converter

Martin Reynaert reynaert at uvt.nl
Fri Oct 26 15:46:00 UTC 2012


Dear Sara,

I seem to have converged on this solution over the past years:

$ abiword -t txt *.doc

This in fact gives text output with extension *.txt for each 
TextFileName.doc. There are also options to keep or change the files' 
character encoding should you want/need to do that.

As is Tristan's solution, this is a command line one. The $ in the 
command stands for the command line 'prompt'.

Google seems to tell me there are plenty of Windows versions for 
Abiword. If I had to do this right now, I would sure compare both 
LibreOffice and Abiword performances in terms of output.

As it is, I do not seem to recall how one would go about doing this in a 
GUI-environment unless there is an actual option for batch conversion in 
one of the program's menus.

Seems you have several options. Do let us know which worked best ;0)

Regards,

Martin


On 10/26/2012 05:19 PM, Tristan Miller wrote:
> Dear Sara,
>
> On 26/10/12 02:03 PM, Sara Berlanda wrote:
>> can anybody advise me about a tool which can convert 3000
>> Word files (.doc) into 3000 .txt files at once? The
>> tool should run on Windows 7 platform.
> LibreOffice <http://www.libreoffice.org/> can do this when invoked from
> the command line with appropriate parameters.
>
> The following works for me on GNU/Linux, though the call should be
> similar or identical on Windows 7:
>
> libreoffice --headless --convert-to txt:text *.doc
>
> Regards,
> Tristan
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121026/ed1b2aaf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list