[Corpora-List] Extract plain text from Wikipedia dump XML format
Motaz SAAD
motaz.saad at inria.fr
Fri Jun 22 09:03:26 UTC 2012
Hello,
You can search google for wiki2plaintext script. you can find it in perl and python
best,
Motaz
----- Original Message -----
> From: "Rahma Sellami" <rahma.sellami at gmail.com>
> To: corpora at uib.no
> Sent: Wednesday, June 20, 2012 7:46:05 PM
> Subject: [Corpora-List] Extract plain text from Wikipedia dump XML
> format
> Hello,
> I downloaded WIkipedia dump XML format, I want to eliminate the
> wikipedia tags to extract the plain text.
> I found the tool wikiprep and I installed it but I do not know what
> script that eliminates the markup wikipedia.
> Thanks --
> RAHMA Sellami
> PhD Computer Science Student
> http://sites.google.com/site/rahmasellami/
> Faculty of Economic Sciences and management of Sfax
> ANLP Research Group
> http://sites.google.com/site/anlprg
> MIRACL Laboratory
> www.miracl.rnu.tn
> Email: rahma.sellami at gmail.com
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120622/e2c35c8e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list