Dear CORPORA mailing list members,<br><br>Do any of you know of any
tool for extracting text specifically from Wikipedia articles, besides
those for extracting text from HTML pages?<br><br>I only need the title
and the text, without any of the formal elements present in every
Wikipedia article (such as "From Wikipedia, the free encyclopedia",
"This article is about ..", [edit], the list of languages,"Main
article:","Categories:") and without "Contents", "See also",
"References", "Notes" and "External links".<br>
<br>Can you give me any suggestions?<br><br>Thank you very much in
advance,<br><br>Irina<br><br><pre cols="72">Irina Temnikova<br><br>PhD Student in Computational Linguistics<br>Editorial Assistant for the Journal of Natural Language Engineering<br>Research Group in Computational Linguistics<br>
Research Institute of Information and Language Processing<br>University of Wolverhampton, UK</pre><br>-- <br>If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea. (Antoine de Saint-Exupery)<br>