[Corpora-List] Looking for a XML to TEXT convertor/editor

Alexandre Rafalovitch arafalov at gmail.com
Tue Nov 28 15:13:50 UTC 2006


It seems to me that we are descending into methods that cater for less
and less for edge cases possible with XML. Certainly, sed or even perl
would only work if the XML encoding is most primitive (e.g. tags with
no elements only, no named entities, etc.). Processing anything even a
tiny bit more complex requires a big jump in XML specific rules and
workarounds. Solutions designed specifically for XML are much better
in the long run.

I have initially recommended XMLStarlet as a more comprehensive
solution, but given other options, I will show how to use it to just
do tag stripping while still taking into account XML special cases:
<location_xmlstarlet>\xml sel  -T -t -m / -v . xmlfile.xml

Regards,
   Alex.

On 11/28/06, Notis Toufexis <notis.toufexis at gmail.com> wrote:
> This one is for all who are not into sed, perl etc.
>
> Jedit's (Java based text editor, www.jedit.org) XML plugin has a "Remove all
> tags" command.
>
> It might win the prize for the fastest way to do it, too.



More information about the Corpora mailing list