[Corpora-List] Second Web People Search Evaluation Workshop: Call for Participation

Satoshi Sekine sekine at cs.nyu.edu
Mon Oct 6 13:32:39 UTC 2008


=======================================================
   Second Web People Search Evaluation Workshop 
             *Call for Participation*
             http://nlp.uned.es/weps
=======================================================

Finding information about people in the World Wide Web is one of the most
common activities of Internet users. Person names, however, are highly
ambiguous. In most cases, the results for a person name search are a mix of
pages about different people sharing the same name. The user is then forced
either to add terms to the query (probably losing recall and focusing on one
single aspect of the person), or to browse every document in order to filter
the information about the person he/she is actually looking for. In an ideal
system the user would simply type a person name, and receive search results
clustered according to the different people sharing that name.

In 2007 the Web People Search Task (Artiles et al. 2007
<http://nlp.uned.es/%7Ejavier/docs/weps2007.pdf>)
was the first competitive evaluation focused on this problem. The 16
participating systems received a set of web pages for a person name, and
they had to cluster them into different entities. This second evaluation
provides a new testbed corpus, improved evaluation metrics, and an
additional attribute extraction subtask.

Task definitions 
----------------
[Clustering]
In this task systems receive as input a set of web search results obtained
when performing a query for an (ambiguous) person name. The expected output
is a clustering of the web pages, where each cluster is assumed to contain
all (and only those) pages that refer to the same individual.
 
[Attribute Extraction]
This subtask consists of extracting 18 kinds of "attribute values" for
target individuals whose names appear on each of the provided Web pages. The
organizers will distribute the target Web pages in their original format
(i.e., html), and the participant systems have to extract attribute values
from each page.

[Complete guidelines and data]
- Clustering task guideline http://nlp.uned.es/weps/weps2/WePS2_Clustering.pdf
- Attribute Extraction task guideline http://nlp.uned.es/weps/weps2/WePS2_Attribute_Extraction.pdf
- Training/development data http://nlp.uned.es/weps/weps-2-data

Participation
-------------
The clustering and the attribute extraction task will be regarded as two
separate subtasks, and therefore a team can choose to participate in only
one or both of them. The organizers will provide annotated data for
developing/training systems. On a second stage, an unannotated corpus will
be distributed, systems output will be collected and evaluation results
returned to the participants. Each team can submit up to five runs. Every
team is expected to write a paper describing their system and discussing the
evaluation results.


How to register
----------------
Please send an email expressing your interest to the task organizers (
weps-organizers at lsi.uned.es).

Important Dates
---------------
   - October 2008: Distribute the training data + CFP
   - December 1-8, 2008: Evaluation
   - December 17, 2008: Return the evaluation result
   - February 2009: Papers due.
   - April 2x, 2009: Workshop in Madrid.

Workshop Organizers
-------------------
   - Javier Artiles, NLP & IR Group (UNED)
   - Julio Gonzalo, NLP & IR Group (UNED)
   - Satoshi Sekine, Proteus Project (NYU)

Program Committee
-----------------
   - Eneko Agirre, UBC
   - Breck Balwin, Alias-i
   - Andrew Borthwick, Spock
   - Jeremy Ellman, Northumbria University
   - Donna Harman, National Institute of Standards and Technology (NIST)
   - Eduard Hovy, ISI
   - Dmitri Kalashnikov, University of California, Irvine
   - Paul Kalmar, Fair Issac
   - Bernardo Magnini, FBK-irst, Italy
   - Gideon Mann, Google
   - Yutaka Matsuo, Tokyo University
   - Manabu Okumura, Tokyo Inst. of Tech.
   - Ted Pedersen, University of Minnesota
   - Massimo Poesio, University of Essex
   - Maarten de Rijke, University of Amsterdam
   - Mark Sanderson, University of Sheffield
   - Arjen P. de Vries, Centrum Wiskunde & Informatica


Updated information about the task can be found at the WePS web site (
http://nlp.uned.es/weps).


Best Regards,
Satoshi Sekine


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list