[Corpora-List] XML parsers vs regex
Alberto Simões
albie at alfarrabio.di.uminho.pt
Mon Jun 30 12:34:49 UTC 2014
Dear Matías,
It depends a lot on the type of the process you are trying to perform.
If you want to search for a specific expression, regexp might be fast
enough.
If you want to use the XML structure (for example, only match on a
specific element type, or if some XML structure is present), the use of
an XML parser might help.
Also note that for processing big XML files the use of a common XML
DOM-oriented parser might be take too much memory. So, in some
situations it would be relevant to choose the type of the XML parser as
well (DOM vs SAX).
Hope this helps,
Best,
Alberto
On 30/06/14, 12:55, Matías Guzmán Naranjo wrote:
> Dear all,
>
> When working with xml tagged corpora I have always used regex to extract
> the information I need, I have never used xml parsers like nltk's or any
> other. Is there an advantage to using parsers vs using regex? Which?
> what do you personally use?
>
> Best,
>
> Matías
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list