[Corpora-List] Looking for a XML to TEXT convertor/editor
Daniel Zeman
zeman at ufal.mff.cuni.cz
Mon Nov 27 13:31:19 UTC 2006
If you have Perl on your machine (default on Linux), the attached Perl
script could help you. You call it like
tei2txt.pl < input.xml > output.txt
It strips XML markup. At every "</p>" tag, it flushes text collected so
far as a new single line. You would have to modify the script if your
XML does not contain p elements or if you want to break the lines elsewhere.
Best,
Dan
Federica Barbieri napsal(a):
> Dear List Members,
>
>
> For my dissertation research, I will need to convert several corpus files in
> XML format into TEXT, so that I can process these files with some of the
> programs for linguistic analysis that we have here at NAU, all of which are
> designed to process text files (with line breaks).
>
> So, I am looking for a good, user-friendly XML to TEXT convertor or editor and
> was wondering if anyone knows of any or has used any that they would
> recommend.
>
> So far I've tried to use the XML FoxAdvance (available at
> http://xmlfox.com/index.htm). However I've had no luck with the trial version
> of this program and the support has been unhelpful (they suggested that I try
> some other product by some of their competitors...).
>
> I would appreciate any suggestions and I will post a summary if there is
> interest.
>
> Thanks!
>
> Federica Barbieri
>
> *****************
> Federica Barbieri
> PhD Candidate in Applied Linguistics
> Department of English
> Northern Arizona University
> Liberal Arts Building, BOX 6032
> Flagstaff, AZ 86011-6032
>
> Office: BAA 322
> Tel: (928) 523 0291
> Fax: (928) 523 7074
> email: Federica.Barbieri at NAU.EDU
>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tei2txt.pl
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20061127/108ff37b/attachment-0001.pl>
More information about the Corpora
mailing list