[Corpora-List] Apply Coreference Resolution in Wikipedia

Gerber Daniel dgerber at informatik.uni-leipzig.de
Fri Apr 20 10:55:44 UTC 2012

I'm currently working on a distant supervision approach for relation extraction. I'm using the english Wikipedia articles to find sentences which contain labels of resources, for example a resource's name like "Barack Obama". My problem is now  that this string only occurs in the first couple of sentences of the article and is then substituted for example with pronouns or things like "The president ..." So what I want to do, is to apply coreference resolution on the complete english Wikipedia (ideally also in other languages like German) and replace those substitutions with the resource name.

Is there a corpus like this already available? If not, would I need to write this myself (using some lib) or are there applications available which are able to do this. 
Also, what would be a good library for this task (speed, accuracy) ? I came across Illinois Coreference Package, StanfordNLP, OpenNLP, Illinois but I can't afford to try them all. :/ 

I would be very happy for some suggestions!

Kind regards,
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list