[Corpora-List] XML parsers vs regex

Dave Graff graff at ldc.upenn.edu
Mon Jun 30 14:44:45 UTC 2014


When it comes to handling specific subsets of content based on the markup structure and/or attribute values of certain tags, using a parser - and particularly a tool that supports XPath expressions - will give fully reliable results with the least amount of effort (given that you've taken the time to learn how to use XPath, which is probably easier, or at least no harder, than learning to use regexes).

Speaking from my own experience (in Perl and Ruby), it seems more often the case that using a parser tends to yield shorter scripts that are easier to maintain and adapt.

    Best regards,
	Dave Graff

On Jun 30, 2014, at 7:55 AM, Matías Guzmán Naranjo <mortem.dei at gmail.com> wrote:

> Dear all,
> 
> When working with xml tagged corpora I have always used regex to extract the information I need, I have never used xml parsers like nltk's or any other. Is there an advantage to using parsers vs using regex? Which? what do you personally use?
> 
> Best,
> 
> Matías

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list