[Corpora-List] Looking for a XML to TEXT convertor/editor

Daniel Zeman zeman at ufal.mff.cuni.cz
Mon Nov 27 13:31:19 UTC 2006


If you have Perl on your machine (default on Linux), the attached Perl 
script could help you. You call it like

tei2txt.pl < input.xml > output.txt

It strips XML markup. At every "</p>" tag, it flushes text collected so 
far as a new single line. You would have to modify the script if your 
XML does not contain p elements or if you want to break the lines elsewhere.

Best,
Dan

Federica Barbieri napsal(a):
> Dear List Members,
>
>
> For my dissertation research, I will need to convert several corpus files in 
> XML format into TEXT, so that I can process these files with some of the 
> programs for linguistic analysis that we have here at NAU, all of which are 
> designed to process text files (with line breaks).
>
> So, I am looking for a good, user-friendly XML to TEXT convertor or editor and 
> was wondering if anyone knows of any or has used any that they would 
> recommend.
>
> So far  I've tried to use the XML FoxAdvance (available at 
> http://xmlfox.com/index.htm). However I've had no luck with the trial version 
> of this program and the support has been unhelpful (they suggested that I try 
> some other product by some of their competitors...).
>
> I would appreciate any suggestions and I will post a summary if there is 
> interest.
>
> Thanks!
>
> Federica Barbieri
>
> *****************
> Federica Barbieri
> PhD Candidate in Applied Linguistics
> Department of English
> Northern Arizona University
> Liberal Arts Building, BOX 6032
> Flagstaff, AZ 86011-6032
>
> Office: BAA 322
> Tel: (928) 523 0291
> Fax: (928) 523 7074
> email: Federica.Barbieri at NAU.EDU
>
>   
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tei2txt.pl
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20061127/108ff37b/attachment-0001.pl>


More information about the Corpora mailing list