[Corpora-List] 2nd CFP PAN at FIRE-2013: Cross Language !ndian News Story Search (CL!NSS)
Parth Gupta
pgupta at dsic.upv.es
Fri Jul 12 10:17:31 UTC 2013
Dear all,
As you probably know, a related PAN activity is also organized at FIRE
(Forum for Information Retrieval Evaluation) on cross-language
similarity search in journalism text reuse. The task is to link the
news stories covering the same event across the languages (English and
Hindi) as a future goal to find the derived/parallel content among
them. The training corpus is already available. Please find CFP
enclosed and consider your participation.
Regards,
Parth Gupta,
On behalf of PAN at FIRE Organizing committee,
http://www.dsic.upv.es/grupos/nle/clinss.html
Apologies for cross-posting
-------------------------------------------------------------------------------
2nd Call for Participation
-------------------------------------------------------------------------------
PAN Track on
Cross-Language !ndian News Story Search (CL!NSS)
held in conjunction with the FIRE 2013 Forum for Information Retrieval
Evaluation
4 - 6 December 2013, New Delhi, India
http://www.dsic.upv.es/grupos/nle/clinss.html
-------------------------------------------------------------------------------
This edition of CL!NSS focuses on journalistic text re-use as previous
year. News agencies are a prolific source of text on the Web and a
valuable source of text in multiple languages. News stories generated
by different authors, whether independently or derived from another
story, typically exist as separate entities and consequently there is
a need to link them.
Linking news stories covering the same events written in different
languages offers a number of benefits. For example, in a multilingual
environment, such as India, where the same news story is covered in
multiple languages, a reader might want to refer to the local language
version of a news story. News stories covering the same event(s),
published in different languages, may also be rich sources of both
parallel and comparable text, for example, parallel fragments in the
news story, e.g. direct quotes or translation equivalents; comparable
fragments, e.g. paraphrases. Therefore identification of similar news
stories written in multiple languages offers a valuable multilingual
resource. In the case of Indian languages there exist limited language
resources for NLP and IR tasks. For instance, identifying comparable
and parallel documents on the web would offer a potential (and
abundant) source for deriving bilingual dictionaries and training
statistical MT systems (Munteanu & Marcu, 2005; Barker & Gaizauskas,
2012).
In this edition, the aim is to identify the same story written across
languages (English and Hindi) - a problem of cross-language news story
detection. The task will involve identifying and linking news stories
covering the same event in Hindi for the given English language news
story.
We invite researchers and practitioners from all fields to participate.
References
1. Dragos Munteanu and Daniel Marcu (2005). Improving Machine
Translation Performance by Exploiting Comparable Corpora.
Computational Linguistics, 31 (4), pp. 477-504, December
2. Emma Barker and Robert Gaizauskas (2012). Assessing the
Comparability of News Texts. In Proceedings of the Eighth
International Conference on Language Resources and Evaluation (LREC'12).
-------------------------------------------------------------------------------
Important Dates
-------------------------------------------------------------------------------
6 May, 2013 Release of training corpus (training period starts)
1 Sept, 2013 Release of test corpus
20 Sept, 2013 Submission of runs
1 Nov, 2013 Release of qrels (result notification)
15 Nov, 2013 Working notes due
-------------------------------------------------------------------------------
Task Coordinators
-------------------------------------------------------------------------------
Parth Gupta, Paolo Rosso
NLE Lab @ Universitat Politècnica de València, Spain
Paul Clough, Mark Stevenson
IR &NLP Groups @ University of Sheffield, UK
Rafael E. Banchs
HLT, Institute for Infocomm Research, Singapore
-------------------------------------------------------------------------------
Contact
-------------------------------------------------------------------------------
E-mail: clinss at dsic.upv.es
Track Web page: http://www.dsic.upv.es/grupos/nle/clinss.html
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list