[Corpora-List] NLP labs that have active projects on Persian

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Thu May 24 08:35:24 UTC 2012


Dear Hamid,

 

At the European Commission’s Joint Research Centre (JRC), we have developed the Europe Media Monitor (EMM) family of applications (http://emm.newsbrief.eu/overview.html), which includes Farsi. 

 

EMM collects Farsi news (together with another 50 or so languages) and displays them in EMM-NewsBrief and in EMM-MedISys (Medical Information System). If you go to ‘advanced search’, you can display all the news sources monitored. Farsi news then get classified according to the many EMM categories and they will be displayed together with those in the other languages, if found.

 

In EMM-NewsExplorer (http://emm.newsexplorer.eu/NewsExplorer/home/fa/latest.html), we display the biggest news cluster of any given calendar day (for 20 languages, including Farsi), together with information we manage to extract. We aim to extract entities (persons and organisation names), geo-locations and quotations. We also try to link the Farsi news to those in (a subset of) other languages and to the news published in previous days.

 

NewsExplorer also collects information found on entities over time and in many languages, and it displays this information on mixed-language pages (e.g. http://emm.newsexplorer.eu/NewsExplorer/entities/en/101358.html for Mahmoud Ahmadinejad).

 

I do not think our Farsi information extraction tools work particularly well, but we intend to put some more effort into the Farsi tools soon.

 

For an overview of the EMM applications, you can read:

 

Steinberger Ralf, Bruno Pouliquen & Erik van der Goot (2009). An introduction to the Europe Media Monitor Family of Applications <http://langtech.jrc.ec.europa.eu/Documents/09_SIGIR-WS_Steinberger+frontmatter.pdf> . In: Fredric Gey, Noriko Kando & Jussi Karlgren (eds.): Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR'2009), pp. 1-8. Boston, USA. 23 July 2009.

 

Greetings, currently from LREC in Istanbul, and best wishes for your interesting effort.

 

Ralf

 

 

Ralf Steinberger 

European Commission – Joint Research Centre (JRC)

URL of the lab: http://langtech.jrc.ec.europa.eu/ 

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Hamid Reza Ghader
Sent: 24 May 2012 10:01
To: corpora at uib.no
Subject: [Corpora-List] NLP labs that have active projects on Persian

 

Dear scientists,

We are going to develop a list of all NLP labs around the world that have active projects on Persian language. So I decided to ask you all to give me your lab name and homepage address if you have any project related to Persian language in your lab. I appreciate if you provide a brief description of the Persian related project of yours.

Regards,
Hamidreza Ghader
Natural language and Text processing Laboratory
School of Electrical and Computer Engineering
University of Tehran
Iran
http://ece.ut.ac.ir/nlp/  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120524/8a0b313c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list