[Corpora-List] Looking for a XML to TEXT convertor/editor

Serge HEIDEN Slh at ens-lsh.fr
Tue Nov 28 18:06:03 UTC 2006


Le Tuesday, November 28, 2006 4:13 PM [GMT+1=CET],
Alexandre Rafalovitch <arafalov at gmail.com> a écrit :

>> Processing anything even a tiny bit more complex requires a big jump
>> in XML specific rules and workarounds. Solutions designed
>> specifically for XML are much better in the long run.

My mention of textonly, from the LT XML toolkit, was in the
same spirit. Being based on a native SGML and XML toolkit, textonly
can deal with some XML specificities. See for example some of its options.
Excerpt of 'man textonly' :

usage: textonly [-d ddb-file] [-u base-url] [-t tag] [-s  c]
     [-x]  [file]

    -t <tag>
          If specified only text inside <tag ...> ...  </tag>  is
          printed. <tag> is the name of an SG/XML element.

     -s <str>
          If present,  the STRING <str> (e.g. ' '  or  "\^J")  is
          printed between each bit of text.

     -x   If present, expand internal SDATA and numerical charac-
          ter references.

Now, XmlStarlet being based on libxml2, it must be very robust
to XML specificities and various extensions.

Best,

-S
_____________________________________________________________
Serge Heiden, slh at ens-lsh.fr, https://weblex.ens-lsh.fr
ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883



More information about the Corpora mailing list