[Corpora-List] Extracting text from Wikipedia articles

Srinivas Gokavarapu srinivasg at research.iiit.ac.in
Fri Aug 27 18:23:52 UTC 2010


Hi,

This is a tool for extracting information from wikipedia.
http://wikipedia-miner.sourceforge.net/ Have a look at it.


Srinivas.

On Fri, Aug 27, 2010 at 11:22 PM, Irina Temnikova <irina.temnikova at gmail.com
> wrote:

> Dear CORPORA mailing list members,
>
> Do any of you know of any tool for extracting text specifically from
> Wikipedia articles, besides those for extracting text from HTML pages?
>
> I only need the title and the text, without any of the formal elements
> present in every Wikipedia article (such as "From Wikipedia, the free
> encyclopedia", "This article is about ..", [edit], the list of
> languages,"Main article:","Categories:") and without "Contents", "See also",
> "References", "Notes" and "External links".
>
> Can you give me any suggestions?
>
> Thank you very much in advance,
>
> Irina
>
> Irina Temnikova
>
> PhD Student in Computational Linguistics
> Editorial Assistant for the Journal of Natural Language Engineering
> Research Group in Computational Linguistics
>
>
>
> Research Institute of Information and Language Processing
> University of Wolverhampton, UK
>
>
> --
> If you want to build a ship, don't drum up the men to gather wood, divide
> the work and give orders. Instead, teach them to yearn for the vast and
> endless sea. (Antoine de Saint-Exupery)
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
G.R.J.Srinivas
OBH 62
IIIT Hyderabad
9492756712
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100827/b255f8ca/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list