[Corpora-List] 2nd Call for Evaluation Resources - MWE 2008 at LREC Conference
Stefan Evert
stefan.evert at uos.de
Sun Jan 20 22:48:28 UTC 2008
########################################################
SECOND CALL FOR EVALUATION RESOURCES
>> LREC2008 - Towards a Shared Task for Multiword Expressions (MWE
2008) <<
endorsed by the ACL Special Interest Group on the Lexicon (SIGLEX)
Date: Sunday, 1 June 2008
Location: Marrakech, Morocco
Deadline: Friday, 1 Feb 2008 (resources) / Friday, 29 Feb 2008 (papers)
Workshop web page: http://multiword.sf.net/mwe2008/
########################################################
In recent years, considerable progress has been made in our
understanding of
multiword expressions (MWE), the development of algorithms for their
automatic
extraction from corpora, and the automatic identification of additional
properties such as morphosyntactic preferences or the interpretation of
semi-compositional expressions.
It is difficult to compare results of the many published studies on
MWEs and
obtain a broader perspective, though, because algorithms and implemented
systems have been evaluated on vastly different gold standards and
corpora, in
different languages, for different subtypes of MWEs, etc. In order to
make the
next big step forward, the field of MWE research needs a shared task
in which
different approaches are applied to the same data sets, allowing
completely
new insights to be gained. Since there is as yet not a clear and
universally
accepted definition of multiword expressions, the first instalment of
this
shared task will be of a more exploratory nature than the
competitions that
have been carried out in other areas of computational linguistics.
The MWE 2008 workshop is primarily intended as a forum for
collecting, sharing
and exploiting MWE evaluation resources. We solicit contributions of
such
resources from the MWE community, in particular:
(1) manually annotated data sets (MWE candidates marked as true and
false
positives, or as different subtypes of MWEs);
(2) data sets of MWEs annotated with additional properties; and
(3) lists of known MWEs, e.g. from machine-readable dictionaries.
In addition, candidate data obtained from corpora with sophisticated
proprietary NLP tools may be of interest, helping researchers to
apply their
statistical MWE identification techniques to a broad range of languages.
The contributed resources will be made available freely for research
purposes
on multiword.sf.net, and should be accompanied by documentation (e.g.
annotation guidelines) on the SourceForge project wiki. Contributors
will be
invited to submit a short paper (4 pages) describing their resource and
summarising previous research carried out on these data.
After collection of the resources, teams participating in the shared
task can
evaluate their MWE extraction algorithms on multiple data sets and
discuss
implications for their generalisability and further development. At the
workshop, the evaluation results of the different teams will be
summarised and
compared. A call for papers and participation in the shared task is
being
distributed separately.
SUBMISSION INFORMATION
**Evaluation Resources**
Please send your resource and documentation to Nicole Grégoire
(Nicole.Gregoire at let.uu.nl). You will then receive an account for the
MWE wiki on which you can publish basic information about your
resource.
The resources will be made available as a downloadable package on our
SourceForge
project page (http://sourceforge.net/projects/multiword) under a
suitable
open-source license (please specify if you require different
licensing terms).
A list of all available resources will be published on the
multiword.sf.net Web site.
To give shared task participants sufficient time to re-evaluate their
models,
we set the deadline for submitting resources on 1 February 2008.
Submissions made
before the deadline are invited to submit a short paper for the
workshop proceedings.
Resource submissions after the deadline (and even after the workshop)
are of course
possible and welcome.
**Paper**
Short papers describing evaluation resources must adhere to the
format of LREC proceedings
(preferably using the style files provided on the conference Web
site) and must not exceed
four (4) pages, including references. Only submissions in PDF format
will be considered.
The papers must be submitted no later than 23:59 GMT on February 29,
2008.
Papers submitted after that time cannot be reviewed.
Please submit your paper here: https://www.softconf.com/LREC2008/
MWE2008/submit.html
IMPORTANT DATES
Resource submission deadline: February 1, 2008
Paper submission deadline: February 29, 2008
Notification of acceptance: March 28, 2008
Camera-ready papers due: April 4, 2008
Workshop date: June 1, 2008
WORKSHOP CHAIRS
Nicole Grégoire
University of Utrecht, The Netherlands
Stefan Evert
University of Osnabrueck, Germany
Brigitte Krenn
Austrian Research Institute for Artificial Intelligence (ÖFAI), Austria
CONTACT
For any inquiries regarding the workshop please contact Nicole Grégoire
(Nicole.Gregoire at let.uu.nl).
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list