[Corpora-List] PASSAGE 2nd French Parsing Evaluation Campaign

Paroubek pap at limsi.fr
Tue May 26 18:19:10 UTC 2009


--------------------------------------------------------------------------------
Call for Participation to the Second PASSAGE French Parsing Evaluation 
Campaign
--------------------------------------------------------------------------------

We invite the developers of parsers for French both from academia and
industry to participate to the second PASSAGE evaluation campaign. In
the course of achieving the two main objectives of the PASSAGE
project (http://atoll.inria.fr/passage):

1. building a large sized Treebank for French and making it available
to the community

2. and investigating lexical acquisition for Parser improvement by
parser outputs merging,

we offer to parser developers the opportunity to test and compare
their system on the PASSAGE corpus. Historically this evaluation
campaign comes after the first evaluation campaign of PASSAGE that
took place in 2007 and the EASY evaluation campaign (2006) of EVALDA
project in the TECHNOLANGUE program.

Participation is free (after registering, see below) on a voluntary
basis and gives access to all the resources that PASSAGE has built
(corpora, annotation editors, evaluation toolkits). To have a look at
the PASSAGE annotations, you can test EasyRef
(http://atoll.inria.fr/easyrefpub/), the WEB interactive annotation
editor now freely open to all potential participants.

--------
The data
--------
The corpus that we propose you to parse is a 100 Million words
collection of material freely available from the WEB completed with a
small amount of copyrighted newspaper material. Word and sentence
segmentations are not imposed. The participant data will be mapped
onto the reference data using a dynamic programming algorithm.

---------------------
Syntactic Annotations
---------------------
The syntactic annotations that we used in PASSAGE are derived from the
one used during the EASY campaign. The annotations are described in
various papers
(http://atoll.inria.fr/passage/articles.en.html). Documentation and
software support is available on the PASSAGE site along with EasyRef
(http://atoll.inria.fr/easyrefpub/), an open WEB annotation editor and
an evaluation server enabling an automatic comparison of one's parser
against the PASSAGE development data (app. 85,600 words) issued from
previous PASSAGE campaign and the EASY campaign.

--------
Schedule 
--------
Both the development corpus resulting from the EASY evaluation
campaign and the PASSAGE-2 test corpus of 100 million words are now
available. They will be communicated to the participants as soon as
they are registered. Participants to PASSAGE-2 are expected to return
the PASSAGE-2 corpus completely parsed with the PASSAGE annotations
between August 31 2009 and October 15 2009.

-----------------
Evaluation Tracks
-----------------
PASSAGE-2 will have 2 evaluation tracks:

1. the manual reference track with its gold standard of 400,000 words
that have been hand-annotated

2. the automatic reference track, where the gold standard will be the
results of combining the ouputs of the participating parsers.

For all registered participants, performance results of the first
track will be published with the participant identified, while
performance results of the second track will be published anonymously
because of the exploratory nature of the reference data.

-----------------------------------------
Link with EVALITA dependency parsing task
-----------------------------------------
For exploring possible links with Parsing evaluation for other
languages, PASSAGE-2 campaign has a tiny development and test corpus
shared with the EVALITA (http://evalita.fbk.eu/) campaign on
dependency parsing (http://evalita.fbk.eu/parsing.html) . Aligned data
both in French and Italian have been hand-annotated (200 sentences of
developement and 50 sentences for test) both with PASSAGE annotations
for the French part and the TUT annotations for the Italian one.

----------------------------
Conditions for participation
----------------------------
Registration is now open and necessitates signing a participation
agreement available at ELDA. Participants are required to return the
test corpus parsed according to the schedule above and agree to the
publication of their identified performance results by the PASSAGE-2
organizer. Please contact Olivier Hamon at ELDA for obtaining the
participation agreement.

--------
Contacts
--------
PASSAGE-2 is organized by:

* ILES (Information Langues Écrites et Signées) of LIMSI-CNRS (Patrick
 Paroubek pap at limsi.fr, or Anne Vilnat anne.vilnat at limsi.fr)

* ELDA (Olivier Hamon, hamon at elda.org)

* INRIA-ALPAGE (Eric de la Clergerie, Eric.De_La_Clergerie at inria.fr). 

with the help of all the PASSAGE participants.

PASSAGE (ANR-06-MDCA-013) is funded by ANR
(http://www.agence-nationale-recherche.fr).
--------------------------------------------------------------------------------

-----
Patrick Paroubek / LIR Group / Human-Machine Communication Dept.
LIMSI - CNRS, Batiment 508 Universite Paris XI, BP 133 - 91403 ORSAY Cedex - France
phone: (33) (0)1 69 85 80 04 fax: (33) (0)1 69 85 80 88 email:pap at limsi.fr

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list