[Corpora-List] CFP: PAN at FIRE-2012: CL!NSS Cross Language !ndian News Story Search

Parth Gupta pgupta at dsic.upv.es
Fri Jul 20 06:55:10 UTC 2012


Apologies for cross-posting

-------------------------------------------------------------------------------
Call for Participation
-------------------------------------------------------------------------------

PAN Track on
Cross-Language !ndian News Story Search

held in conjunction with the FIRE 2012 Forum for Information Retrieval  
Evaluation
17 - 19 December 2012, Indian Statistical Institute, Kolkata
http://www.dsic.upv.es/grupos/nle/clinss.html

-------------------------------------------------------------------------------


This edition of CL!TR->CL!NSS focuses on journalistic text re-use.  
News agencies are a prolific source of text on the Web and a valuable  
source of text in multiple languages. News stories generated by  
different authors, whether independently or derived from another  
story, typically exist as separate entities and consequently there is  
a need to link them.

Linking news stories covering the same events written in different  
languages offers a number of benefits. For example, in a multilingual  
environment, such as India, where the same news story is covered in  
multiple languages, a reader might want to refer to the local language  
version of a news story. News stories covering the same event(s),  
published in different languages, may also be rich sources of both  
parallel and comparable text, for example, parallel fragments in the  
news story, e.g. direct quotes or translation equivalents; comparable  
fragments, e.g. paraphrases. Therefore identification of similar news  
stories written in multiple languages offers a valuable multilingual  
resource. In the case of Indian languages there exist limited language  
resources for NLP and IR tasks. For instance, identifying comparable  
and parallel documents on the web would offer a potential (and  
abundant) source for deriving bilingual dictionaries and training  
statistical MT systems (Munteanu & Marcu, 2005; Barker & Gaizauskas,  
2012).

In this edition, the aim is to identify the same story written in  
multiple languages (English, Hindi and Gujarati) - a problem of  
cross-language news story detection. The task will involve identifying  
and linking news stories covering the same event in Indian languages  
Hindi and Gujarati for the given English language news story.

We invite researchers and practitioners from all fields to participate.

References
1. Dragos Munteanu and Daniel Marcu (2005). Improving Machine  
Translation Performance by Exploiting Comparable Corpora.  
Computational Linguistics, 31 (4), pp. 477-504, December
2. Emma Barker and Robert Gaizauskas (2012). Assessing the  
Comparability of News Texts. In Proceedings of the Eighth  
International Conference on Language Resources and Evaluation (LREC'12).

-------------------------------------------------------------------------------
Important Dates
-------------------------------------------------------------------------------

03 Sept, 2012 	Release of training corpus
01 Oct, 2012 	Release of test corpus
21 Oct, 2012 	Submission of runs
19 Nov, 2012 	Release of qrels (result notification)
02 Dec, 2012 	Working notes due

-------------------------------------------------------------------------------
Track Coordinators
-------------------------------------------------------------------------------

Parth Gupta, Paolo Rosso
NLE Lab @ Universidad Politécnica de Valencia, Spain

Alberto Barrón-Cedeño
LSI @ Universitat Politècnica de Catalunya

Paul Clough, Mark Stevenson
IR & NLP Groups @ University of Sheffield, UK

Sobha Lalitha Devi
CLR Group @ AU-KBC Research Centre, Chennai, India


-------------------------------------------------------------------------------
Contact
-------------------------------------------------------------------------------

E-mail: clitr at dsic.upv.es
Track Web page: http://www.dsic.upv.es/grupos/nle/clinss.html



--
Parth Gupta,
PhD Student, NLE Lab,
Universidad Politécnica de Valencia (UPV), Spain
http://users.dsic.upv.es/~pgupta/


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list