[Corpora-List] Tools for batch conversion Word to UTF-8.
Tino Didriksen
tino at didriksen.cc
Thu Feb 9 12:07:33 UTC 2012
Modern MS Word .docx files are ZIPs with XML documents, which don't require
much scripting to extract plain text from.
Older .doc files will need a trip through some tool. It is possible to use
OpenOffice/LibreOffice in headless mode for this, and OOo/LO's Office
reader gets most of the doc format right.
-- Tino Didriksen
On Thu, Feb 9, 2012 at 12:38, Josep M. Fontana <josepm.fontana at upf.edu>wrote:
> Does anyone here know of a good free application to batch convert Word
> documents to UTF-8? (Linux, OS X or Windows, it doesn't matter)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120209/02f77222/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list