[Corpora-List] Looking for a XML to TEXT convertor/editor

Lou lou.burnard at computing-services.oxford.ac.uk
Mon Nov 27 09:40:23 UTC 2006


The easiest way to do this properly, provided the files are well formed 
xml, is to use an xslt stylesheet.

A completely empty stylesheet will, by default, simply give you the text 
content of the input XML.

Try this:

1. download and install an XSLT processor such as xsltproc

2. create a file like the following

-------------------

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="text" encoding="utf-8" />

  <xsl:template match="teiHeader"/>

  <xsl:template match="text">
    <xsl:apply-templates/>
  </xsl:template>

</xsl:stylesheet>

-----------------

In this example, the content of any <teiHeader> element in the input 
will be suppressed, and the content of any <text> element will be passed 
through. If your document uses different names for the elements, you can 
edit the above as needed.

Run the script by referencing it from your XML files with a <?stylesheet 
command or, more easily, by using a standalone processor such as xsltproc:

xsltproc mystylesheet.xsl myinputfile > myoutputfile


Federica Barbieri wrote:
> Dear List Members,
>
>
> For my dissertation research, I will need to convert several corpus files in 
> XML format into TEXT, so that I can process these files with some of the 
> programs for linguistic analysis that we have here at NAU, all of which are 
> designed to process text files (with line breaks).
>
> So, I am looking for a good, user-friendly XML to TEXT convertor or editor and 
> was wondering if anyone knows of any or has used any that they would 
> recommend.
>
> So far  I've tried to use the XML FoxAdvance (available at 
> http://xmlfox.com/index.htm). However I've had no luck with the trial version 
> of this program and the support has been unhelpful (they suggested that I try 
> some other product by some of their competitors...).
>
> I would appreciate any suggestions and I will post a summary if there is 
> interest.
>
> Thanks!
>
> Federica Barbieri
>
> *****************
> Federica Barbieri
> PhD Candidate in Applied Linguistics
> Department of English
> Northern Arizona University
> Liberal Arts Building, BOX 6032
> Flagstaff, AZ 86011-6032
>
> Office: BAA 322
> Tel: (928) 523 0291
> Fax: (928) 523 7074
> email: Federica.Barbieri at NAU.EDU
>
>
>
>   



More information about the Corpora mailing list