[Corpora-List] 2nd Call for Evaluation Resources - MWE 2008 at LREC Conference

Stefan Evert stefan.evert at uos.de
Sun Jan 20 22:48:28 UTC 2008


########################################################

  SECOND CALL FOR EVALUATION RESOURCES

 >> LREC2008 - Towards a Shared Task for Multiword Expressions (MWE  
2008) <<

  endorsed by the ACL Special Interest Group on the Lexicon (SIGLEX)


Date: Sunday, 1 June 2008
Location: Marrakech, Morocco
Deadline: Friday, 1 Feb 2008 (resources) / Friday, 29 Feb 2008 (papers)

Workshop web page: http://multiword.sf.net/mwe2008/

########################################################

In recent years, considerable progress has been made in our  
understanding of
multiword expressions (MWE), the development of algorithms for their  
automatic
extraction from corpora, and the automatic identification of additional
properties such as morphosyntactic preferences or the interpretation of
semi-compositional expressions.

It is difficult to compare results of the many published studies on  
MWEs and
obtain a broader perspective, though, because algorithms and implemented
systems have been evaluated on vastly different gold standards and  
corpora, in
different languages, for different subtypes of MWEs, etc. In order to  
make the
next big step forward, the field of MWE research needs a shared task  
in which
different approaches are applied to the same data sets, allowing  
completely
new insights to be gained. Since there is as yet not a clear and  
universally
accepted definition of multiword expressions, the first instalment of  
this
shared task will be of a more exploratory nature than the  
competitions that
have been carried out in other areas of computational linguistics.

The MWE 2008 workshop is primarily intended as a forum for  
collecting, sharing
and exploiting MWE evaluation resources.  We solicit contributions of  
such
resources from the MWE community, in particular:

  (1) manually annotated data sets (MWE candidates marked as true and  
false
      positives, or as different subtypes of MWEs);

  (2) data sets of MWEs annotated with additional properties; and

  (3) lists of known MWEs, e.g. from machine-readable dictionaries.

In addition, candidate data obtained from corpora with sophisticated
proprietary NLP tools may be of interest, helping researchers to  
apply their
statistical MWE identification techniques to a broad range of languages.

The contributed resources will be made available freely for research  
purposes
on multiword.sf.net, and should be accompanied by documentation (e.g.
annotation guidelines) on the SourceForge project wiki. Contributors  
will be
invited to submit a short paper (4 pages) describing their resource and
summarising previous research carried out on these data.

After collection of the resources, teams participating in the shared  
task can
evaluate their MWE extraction algorithms on multiple data sets and  
discuss
implications for their generalisability and further development. At the
workshop, the evaluation results of the different teams will be  
summarised and
compared. A call for papers and participation in the shared task is  
being
distributed separately.


SUBMISSION INFORMATION

**Evaluation Resources**

Please send your resource and documentation to Nicole Grégoire
(Nicole.Gregoire at let.uu.nl). You will then receive an account for the
MWE wiki on which  you can publish basic information about your  
resource.

The resources will be made available as a downloadable package on our  
SourceForge
project page (http://sourceforge.net/projects/multiword) under a  
suitable
open-source license (please specify if you require different  
licensing terms).
A list of all available resources will be published on the  
multiword.sf.net Web site.

To give shared task participants sufficient time to re-evaluate their  
models,
we set the deadline for submitting resources on 1 February 2008.   
Submissions made
before the deadline are invited to submit a short paper for the  
workshop proceedings.
Resource submissions after the deadline (and even after the workshop)  
are of course
possible and welcome.

**Paper**

Short papers describing evaluation resources must adhere to the  
format of LREC proceedings
(preferably using the style files provided on the conference Web  
site) and must not exceed
four (4) pages, including references. Only submissions in PDF format  
will be considered.

The papers must be submitted no later than 23:59 GMT on February 29,  
2008.
Papers submitted after that time cannot be reviewed.

Please submit your paper here: https://www.softconf.com/LREC2008/ 
MWE2008/submit.html


IMPORTANT DATES

Resource submission deadline: February 1, 2008
Paper submission deadline: February 29, 2008
Notification of acceptance: March 28, 2008
Camera-ready papers due: April 4, 2008
Workshop date: June 1, 2008


WORKSHOP CHAIRS

Nicole Grégoire
University of Utrecht, The Netherlands

Stefan Evert
University of Osnabrueck, Germany

Brigitte Krenn
Austrian Research Institute for Artificial Intelligence (ÖFAI), Austria


CONTACT

For any inquiries regarding the workshop please contact Nicole Grégoire
(Nicole.Gregoire at let.uu.nl).
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list