[Corpora-List] Looking for a XML to TEXT convertor/editor
Serge HEIDEN
Slh at ens-lsh.fr
Tue Nov 28 18:06:03 UTC 2006
Le Tuesday, November 28, 2006 4:13 PM [GMT+1=CET],
Alexandre Rafalovitch <arafalov at gmail.com> a écrit :
>> Processing anything even a tiny bit more complex requires a big jump
>> in XML specific rules and workarounds. Solutions designed
>> specifically for XML are much better in the long run.
My mention of textonly, from the LT XML toolkit, was in the
same spirit. Being based on a native SGML and XML toolkit, textonly
can deal with some XML specificities. See for example some of its options.
Excerpt of 'man textonly' :
usage: textonly [-d ddb-file] [-u base-url] [-t tag] [-s c]
[-x] [file]
-t <tag>
If specified only text inside <tag ...> ... </tag> is
printed. <tag> is the name of an SG/XML element.
-s <str>
If present, the STRING <str> (e.g. ' ' or "\^J") is
printed between each bit of text.
-x If present, expand internal SDATA and numerical charac-
ter references.
Now, XmlStarlet being based on libxml2, it must be very robust
to XML specificities and various extensions.
Best,
-S
_____________________________________________________________
Serge Heiden, slh at ens-lsh.fr, https://weblex.ens-lsh.fr
ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
More information about the Corpora
mailing list