Appel: MEMURA-2004 : Workshop on Methodologies and Evaluation of Multiword Units

alexis.nasr at LINGUIST.JUSSIEU.FR alexis.nasr at LINGUIST.JUSSIEU.FR
Tue Feb 3 08:53:22 UTC 2004

******************* SECOND CALL FOR PAPERS *******************

Workshop on Methodologies and Evaluation of Multiword Units
               in Real-world Applications
                   (MEMURA Workshop)


    In association with the 4th International Conference On
         Language Resources and Evaluation - LREC 2004

            Centro Cultural de Belém, Lisbon, Portugal
                          May 25, 2004


********************* CALL FOR PAPERS *********************

This annoucement contains:
   [1] Workshop Description
   [2] Target Audience
   [3] Areas of Interest
   [4] Invited Speaker
   [5] Important dates
   [6] Abstract Submission
   [7] Workshop Chairs
   [8] Program Committee
   [9] Contact

[1] Workshop Description:

Multiword units (MWUs) include a large range of linguistic phenomena,
such as phrasal verbs (e.g. "look forward"), nominal compounds
(e.g. "interior designer"), named entities (e.g. "United Nations"),
set phrases (e.g. "con carne") or compound adverbs (e.g. "by the
way"), and they can be syntactically and/or semantically idiosyncratic
in nature.  MWUs are used frequently in everyday language, usually to
express precisely ideas and concepts that cannot be compressed into a
single word. A considerable amount of research has been devoted to
this subject, both in terms of theory and practice, but despite
increasing interest in idiomaticity within linguistic research, many
questions still remain unanswered.  The objective of this workshop is
to deal with three important questions that are of great interest for
real-world applications.

1) Comparison of MWU extraction methodologies

Many methodologies have been proposed in order to automatically
extract or identify MWUs. However, not many efforts have been devoted
to compare their results. The core differences between the
methodologies is certainly the main reason why such works are so
rare. For instance, it is not easy to compare language-dependent
methodologies as the results depend on the efficiency of parameter
tuning in the broad sense of its acception (i.e. semantic tagging,
local specific grammars, lematization, part-of-speech tagging
etc.). Another important problem is the fact that there is no real
agreement between researchers about the definition of MWUs which would
provide the basis for an objective evaluation. The objective of the
workshop is to gather people that have recently been working in this
area so that new trends in comparing MWU extraction methodologies and
their evaluation can be pointed at.

2) Evaluation of the benefits of the integration of MWUs in real-world

It is not yet clear whether MWUs really improve NLP applications.  It
is common sense that Machine Translation is one application that takes
great advantage of MWUs databanks. However, does the same apply to
applications in Automatic Summarization, Information Retrieval (IR),
Cross-language IR, Information Extraction, Text
Clustering/Classification, Parallel Corpus Alignment? Indeed, could
the identification of MWUs introduce new constraints that are not
present in original texts? Should MWUs be considered as units that
should not be analysable in terms of their components meaning? Or
should they be treated as unanalysable? Should NLP methods work both
on isolated words and on agregated MWUs? The answers are anything but
clear. Here, the objective of the workshop is to point at successes
and failures of the integration of MWUs in real-world applications.

3) Comparison of scalable architectures for the extraction and
identification of MWUs

Real-world applications are constrained by variables like processing
time and memory space. However, identifying and extracting MWUs is
usually a computationally heavy process. In recent years, new
algorithms and new technologies have been proposed to introduce MWU
treatmement in large scale applications, thus avoiding previous
untractable implementations. Previous workshops on MWUs have mainly
focused on the unconstrained extraction process. In this workshop, we
would like to focus on the comparison of different factors that can
influence the scalability of the treatment of MWUs in real-world
applications, namely data structures, algorithms, parallel and
distributed computing, grid computing etc. Indeed, as we said earlier,
some extraction strategies may not scale to deal with huge volumes of

[2] Target Audience:

This workshop is intended to bring together NLP researchers working on
all areas of MWUs. The objective is to summarise what has been
achieved in the area of MWU in real-world applications, to establish
common themes between different approaches, and to discuss future

[3] Areas of Interest:

Abstracts are invited on, but not limited to, the following topics:

     * Automatic, semi-automatic and manual evaluations of MWUs
     * Resources for evaluating MWUs extractors
     * Evaluation Standards
     * Cross-language and Cross-domain evaluations of MWUs extractors
     * Comparative evaluation of MWUs extractors
     * Evaluation of the integration of MWUs in NLP applications:
       Summarization, (Cross-language) Information Retrieval, Information
       Extraction, Machine Translation, Text Classification etc.
     * Scalable algorithms, new data structures, Parallel and Distributed
       processing and Grid computing for MWUs extraction and/or
     * Comparative evaluation of extraction software architectures
     * Role of isolated words and MWUs for a sense-based definition of

Abstracts can cover one or more of these areas.

[4] Invited Speaker:

Kenneth W. Church (AT&T Labs Research, USA)

[5] Important dates:

Abstract submission deadline: February 23, 2004
Notification: March 15, 2004
Camera ready papers: April 12, 2004
Workshop: May 25, 2004

[6] Abstract Submission:

Abstracts should consist of about 1000 words. Abstracts should be
submitted electronically in pdf format only to Gaël Harry Dias
[ddg at]. The following URL transforms postscript files to pdf
files ( The subject line should be "LREC 2004

Because reviewing is blind, no author information should be included as
part of the abstract (i.e. the names of the authors and references that
could identify the authors). An identification page must be sent in a
separate email with the subject line "LREC 2004 MEMURA WORKSHOP ID PAGE"
and must include title, author(s), keywords, word count and name and
email of the contact author.

Late submissions will not be accepted. Notification of receipt will be
emailed to the contact author shortly after receipt.

[7] Workshop Chairs:

Gaël Harry Dias (Beira Interior University, Portugal)
José Gabriel Pereira Lopes (New University of Lisbon, Portugal)
Spela Vintar (University of Ljubljana, Slovenia)

[8] Program Committee:

Timothy Baldwin (Stanford University, United States of America)
Sophia Ananiadou (University of Salford, England)
Didier Bourigault (University of Toulouse, France)
Pascale Fung (University of Science and Technology, Hong Kong)
Mikio Yamamoto (University of Tsukuba, Japan)
Dekang Lin (University of Alberta, Canada)
Aline Villavicencio (University of Cambridge, England)
Heiki Kaalep (University of Tartu, Estonia)
Joaquim da Silva (New University of Lisbon)
Eric Gaussier (Xerox Research Centre Europe, France)
Adeline Nazarenko (University Paris XIII, France)
António Branco (Lisbon University, Portugal)

[9] Contact:


Gaël Harry Dias
Human Language Technology Interest Group
Departamento de Informática
Universidade da Beira Interior
Rua Marqu�ªs d'�vila e Bolama
6201-001 Covilhã
email: ddg at
Tel: +351 275 319 700
Fax: +351 275 319 732

-- ---------------------------------------------------------
Gaël Harry Dias, PhD |
Assistant Professor Human Language Technology Group | []
Computer Science Department | [ddg at]
Beira Interior University |
[Tel: +351 275 319 700] 6201-001 - Covilhã - PORTUGAL |
[Fax: +351 275 319 732]

Message diffusé par la liste Langage Naturel <LN at>
Informations, abonnement :
English version          :
Archives                 :

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  :

More information about the Ln mailing list