[Corpora-List] XML parsers vs regex

Alberto Simões albie at alfarrabio.di.uminho.pt
Mon Jun 30 12:34:49 UTC 2014


Dear Matías,

It depends a lot on the type of the process you are trying to perform.
If you want to search for a specific expression, regexp might be fast 
enough.

If you want to use the XML structure (for example, only match on a 
specific element type, or if some XML structure is present), the use of 
an XML parser might help.

Also note that for processing big XML files the use of a common XML 
DOM-oriented parser might be take too much memory. So, in some 
situations it would be relevant to choose the type of the XML parser as 
well (DOM vs SAX).

Hope this helps,
Best,
Alberto

On 30/06/14, 12:55, Matías Guzmán Naranjo wrote:
> Dear all,
>
> When working with xml tagged corpora I have always used regex to extract
> the information I need, I have never used xml parsers like nltk's or any
> other. Is there an advantage to using parsers vs using regex? Which?
> what do you personally use?
>
> Best,
>
> Matías
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list