[Corpora-List] Alias Detection Dataset

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Tue Oct 7 14:11:13 UTC 2014


Dear Ayah,

 

I am not entirely sure what you mean when writing “without the use of external resources”, but you may find that JRC-Names <https://ec.europa.eu/jrc/en/language-technologies/jrc-names>  can be helpful for your task. You can download it and integrate it with your application. You find JRC-Names at:

 

                https://ec.europa.eu/jrc/en/language-technologies/jrc-names .

 

JRC-Names is a collection of several hundred thousand names and their variant spellings, including across scripts and languages. The name spellings were found by analysing almost 200,000 multilingual online news articles per day and by automatically merging spelling variants with previously known name spellings. For example, you will find 

 

Wladimir Putin, 

Vladimir Poutine, 

Vladímir Putin, 

Vlagyimir Putyin, 

فلاديمير بوتين 

and more as variant spellings of 

Владимир Путин.

 

JRC-Names is updated daily with new names and name variant spellings found. JRC-Names is a by-product of the  Europe Media Monitor <http://emm.newsbrief.eu/overview.html>  family of applications.

 

Of course you can take the full names apart in order to work with name parts only (e.g. Vladimir).

 

I hope you find this resource useful.

 

All the best,

 

Ralf

 

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Ayah Zirikly
Sent: 07 October 2014 15:48
To: corpora at uib.no
Subject: [Corpora-List] Alias Detection Dataset

 

Hi, 

 

I am trying to find datasets that handle alias detection (preferably in microblogs). The task I am interested in is given free text, find the aliases of a named entity without the use of external resources. It doesn't have to be person names, it can be organization or any type of a named entity.  

 

Thanks a lot,

 

Ayah

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141007/7f4fc19e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list