[Corpora-List] XML parsers vs regex

Piotr Bański bansp at o2.pl
Mon Jun 30 16:44:42 UTC 2014


Dear Matías,

This topic sometimes briefly surfaces on the xml-dev list, before it
goes out in flame. You might want to check the archives at:

http://lists.xml.org/archives/xml-dev/

In most cases, the reply, naturally, begins with "it depends": for
trivial cases addressing simple embedded markup, why not regex, but for
more complex cases, you may want to start thinking vis-a-vis the
complexity of the source (see Mike Maxwell's reply for starters) and the
complexity of what you want to retrieve, and then please do not forget
to think about making your queries portable and verifiable/readable for
others, especially those of us who aren't regex-geeks.

Best regards,

  Piotr

On 30/06/14 13:55, Matías Guzmán Naranjo wrote:
> Dear all,
> 
> When working with xml tagged corpora I have always used regex to extract
> the information I need, I have never used xml parsers like nltk's or any
> other. Is there an advantage to using parsers vs using regex? Which?
> what do you personally use?
> 
> Best,
> 
> Matías
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list