[Corpora-List] CLEF 2012 Labs - Registration is now open

Thu Mar 1 11:16:49 UTC 2012

**Apologies if you receive multiple copies. Please disseminate as appropriate **

************************************************************************************************************************************************
NEW!  The registration to CLEF 2012 Labs is now open. 

Please register to the Lab(s) you are interested in, by filling in the form at http://clef2012.org/index.php?page=Pages/registrationForm.php

************************************************************************************************************************************************

CLEF 2012 Conference and Labs of the Evaluation Forum
Information Access Evaluation meets Multilinguality, Multimodality, and Visual Analytics

http://clef2012.org/

CLEF 2012 Labs - Call for Participation -

The CLEF 2012 is next year's edition of the popular CLEF campaign and workshop series (http://www.clef-initiative.eu//) which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. 
In 2010 CLEF was launched in a new format, as a conference with research presentations, panels, poster and demo sessions and laboratory evaluation workshops. Labs follow under two types: laboratories to conduct evaluation of information access systems, and workshops to discuss and pilot innovative evaluation activities.
In 2012, CLEF will take place in September 17-20 in Rome, and researchers and practitioners from all segments of the information access and related communities are invited to participate to the following Evaluation Labs:

CHiC - Cultural Heritage in CLEF 
The CHiC 2012 pilot evaluation lab aims at moving towards a systematic and large-scale evaluation of cultural heritage digital libraries and information access systems. Data test collections and queries will come from the cultural heritage domain (in 2012 data from Europeana) and tasks will contain a mix of conventional system-oriented evaluation scenarios (e.g. ad-hoc retrieval and semantic enrichment) for comparison with other domains and a uniquely customized scenario for the CH domain, i.e. a variability task to present a particular good overview ("must sees") over the different object types and categories in the collection targeted towards a casual user.
Lab Coordinators: Berlin School of Library and Information Science, Humboldt-Universität zu Berlin (DE); Department of Information Engineering, U. of Padova (IT); Royal School of Library and Information Science, Copenhagen (DE); The Information School, U. of Sheffield (UK); Europeana, The Hague, Netherlands (NL)
Lab Webpage: http://www.promise-noe.eu/chic-2012/home

CLEF-IP : IR in the IP domain
The CLEF-IP lab provides a large collection of XML documents representing patents and patent images. On this collection we organize the following four tasks:
- Passage Retrieval starting from claims: Starting from a given claim, we ask to retrieve relevant documents in the collection and mark out the relevant passages in these documents.
- Matching Claim to description in a single document (Pilot): Starting from the claims of an patent application, we ask to indicate the paragraphs in the application's description section (same document) that best explain the contents of the given claim.
- Flowchart Recognition Task: Extract the information in flowchart images and return it in a predefined textual format.
- Chemical Structure Recognition Task. Starting from TIFF images containing patent scans, we ask to identify the location of the chemical structures depicted on these pages and, for each of them, return the corresponding structure in a MOL file (a chemical structure file format).
Lab Coordinators: Vienna University of Technology (AT), SAIC-Frederick Inc. (US), Fraunhofer SCAI (DE), U. of Birmingham (UK). 
Lab Webpage: http://ifs.tuwien.ac.at/~clef-ip/

ImageCLEF - Cross Language Image Retrieval
This lab evaluates the cross-language annotation and retrieval of images by focusing on the combination of textual and visual evidence. Four challenging tasks are foreseen:
- Medical task: image modality classification and image retrieval with visual, semantic and mixed topics in several languages, using a data collection from the biomedical literature;
- Photo annotation and retrieval: semantic concept detection and concept-based retrieval using Flickr data, and large-scale annotation using general Web data;
- Plant identification: visual classification of leaf images for the identification of plant species;
- Robot vision: semantic localisation of a mobile robot using multimodal place classification, with special focus on generalization.
Lab Coordinators: IDIAP (CH), National Library of Medicine (US), U. of Applied Sciences Western Switzerland (CH), CEA LIST (FR), Harvard Medical School (US),  Yahoo! Research (ES), Nuance Communications (US), INRA-AMAP (FR), INRIA (FR), U. Politècnica de Valencia (ES).
Lab Webpage: http://www.imageclef.org/

INEX - INitiative for the Evaluation of XML Retrieval
INEX has been pioneering structured retrieval since 2002, and will join forces with CLEF. running five tracks:
- Social Book Search Track: studying the value of user-generated descriptions in addition to formal metadata on a collection of Amazon Books and LibraryThing.com data.
- Data Centric Track: studying adhoc search and facetted search on a collection of Linked Data (DBpedia) tied to a large corpus (Wikipedia).
- Snippet Retrieval Track: studying the generation of informative snippets with sufficient information to determine the relevancy of search results.
- Show Me Your Code Track: asking participants to submit system components (in particular feedback) rather than results.
- Tweet Contextualization Track: retrieving synthetic contextual information from Wikipedia in response to a tweet with a URL on a small terminal like a phone.
Lab Coordinators: Queensland University of Technology (AU), University of Amsterdam (NL), Saarland University/MPI (DE), and the track organizers.
Lab Webpage: http://inex.mmci.uni-saarland.de/

PAN - Uncovering Plagiarism, Authorship, and Social Software Misuse
PAN offers three tasks:
- Plagiarism Detection. This task features a new plagiarism corpus based on the ClueWeb09, the new search engine ChatNoir which indexes the corpus, the cloud-based algorithm evaluation architecture TIRA, and for the first time, real plagiarism cases. At the conference, keynotes about cross-language plagiarism detection will be held by Roberto Navigli (Università La Sapienza), and Ralf Steinberger (European Commission, JRC).
- Author Identification. This task focuses on identifying sexual predators in chat logs and on authorship verification. Moreover, it features for the first time real cases of disputed authorship.
- Quality Flaw Prediction in Wikipedia. This task is newly introduced, and it is about identifying Wikipedia articles which contain certain information quality flaws. It generalizes the vandalism detection task of last year.
Lab Coordinators: Bauhaus-Universität Weimar (DE), U. Politécnica de Valencia (ES), U. of the Aegean (GR), Bar-Ilan University (IL), Illinois Institute of Technology (US), Duquesne University (US), and U. of Lugano (CH).
Lab web page: http://pan.webis.de

QA4MRE- Question Answering for Machine Reading Evaluation 
The goal of QA4MRE is to evaluate Machine Reading abilities through Question Answering and Reading Comprehension Tests.  The task focuses on the reading of single documents and the identification of the answers to a set of questions about information that is stated or implied in the text. Questions are in the form of multiple choice, each having five options, and only one correct answer. The participating systems will be required to answer the questions by choosing in each case one answer from the five alternatives. Systems should be able to use knowledge from given texts which may be used to assist with answering the questions, anyway, the principal answer is to be found among the facts contained in the test documents given. Two additional pilots are also proposed:
- Processing Modality and Negation for Machine Reading: aimed at evaluating whether systems are able to understand extra-propositional aspects of meaning like modality and negation. 
- Machine Reading of Biomedical Texts about Alzheimer: aimed at setting questions in the biomedical domain with a special focus on the Alzheimer disease.
Lab Coordinators: UNED (ES), ISI (US), CELCT (IT), University of Limerick (IE), University of Antwerp (BE).  
Lab Webpage: http://celct.fbk.eu/QA4MRE/

RepLab 2012
Online Reputation Management deals with the image that online media project about individuals and organizations.  The aim is to bring together the Information Access research community with representatives from the Online Reputation Management industry, with the goals of (i) establishing a five-year roadmap that includes a description of the language technologies required in terms of resources, algorithms, and applications; (ii) specifying suitable evaluation methodologies and metrics; and (iii) developing of test collections that enable systematic comparison of algorithms and reliable benchmarking of commercial systems. Two shared tasks on Twitter data are offered: 
(i) a monitoring task, where the goal is to thematically cluster tweets including a company's name as a step towards early alerting on issues that may damage the company's reputation.
(ii) a profiling task, where the goal is annotating tweets according to their polarity for reputation (i.e. as to whether their content has positive/negative implications for the company's reputation).  
Lab Coordinators: Llorente & Cuenca, U. Amsterdam, UNED
Lab Webpage: http://www.limosine-project.eu/events/replab2012

WORKSHOP

CLEFeHealth 2012
CLEFeHealth 2012 is a one-day workshop on cross-language evaluation of methods, applications, and resources for eHealth document analysis with a focus on written and spoken natural-language processing. We invite research, industry and government representatives to develop with us a roadmap towards the vision of using systematically evaluated ICT tools to analyse and integrate eHealth documents across languages, genres, and jargons. We call for 1-2 page abstracts on: (a) evaluation of mono-and multilingual methods, applications 
and resources for eHealth document analysis; and (b) development of statistical and user-feedback based evaluation protocols, settings, methods and measures for cross-language evaluation of methods, applications, and resources for eHealth document analysis. We have a double-blind review process, so please note that the submission deadline is May 2012.
Lab Coordinators: National ICT Australia (NICTA)
Lab Webpage: www.nicta.com.au/clefehealth2012

================================
Pamela Forner
CELCT (web: www.celct.it)
Center for the Evaluation of Language and Communication Technologies
Via alla Cascata 56/c 
38100 Povo - TRENTO -Italy

email: forner at celct.it
tel.:  +39 0461 314 804
fax:  +39 0461 314 846

Secretary Phone:  +39 0461 314 870

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora