[Corpora-List] Convertor XML to TXT

Emiliano Guevara emiliano.guevara at unibo.it
Fri May 23 12:13:34 UTC 2008


Hi Souhir,

there's many ways to do this, but you will have to be more detailed  
about what you mean by "convert".

- If you are NOT interested at all about keeping the structured  
information in your XML, then even a simple Regex could do the job  
(basically deleting all the tags and keeping the raw text)

- If you DO want to keep the structured information (or a part of it),  
then you will need to parse the XML, find the elements that you want  
to keep and print them out as you wish. You can do it writing a script  
in any programming language with good XML libraries, but I think that  
Perl could be give you good start. Some examples/tutorials:

http://www.perlmonks.org/?node_id=46517
http://www.ibm.com/developerworks/xml/library/x-domprl/
http://articles.techrepublic.com.com/5100-10878_11-1044612.html
http://articles.techrepublic.com.com/5100-10878_11-5363190.html?tag=rbxccnbtr1

- In alternative, you could apply a simple "transformation" with XSL/ 
XSLT (some XML editors allow you to do this straight away, I have used  
Oxygen for this: http://www.oxygenxml.com/ )

All of these involve some programming/coding, I don't know of any pre- 
cooked tools for the task.

Good luck,

E.


On May 23, 2008, at 13:32 PM, souhir hajji wrote:

> Dear all,
> Does anyone know about any FREE convertor system that could be used
> to convert XML files to TXT files?
>
> Many thanks for your help.
>
> Souhir Hajji
> Master student
> MIRACL Laboratory
> Sfax, TUNISIA
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

****************************************
Emiliano R. Guevara
Facoltà di Lingue e Lett. Straniere
Dipart. di Lingue e Lett. Straniere
Università di Bologna
Via Cartoleria 5 (40124) Bologna, Italia
   http://morbo.lingue.unibo.it/
   emiliano.guevara at unibo.it
   emiguevara at gmail.com
****************************************




_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list