[Corpora-List] Call for Papers: LREC 2004 Workshop: MEMURA

Tue Jan 6 10:26:36 UTC 2004

********************* CALL FOR PAPERS *********************

                      MEMURA-2004 
Workshop on Methodologies and Evaluation of Multiword Units 
              in Real-world Applications 
                  (MEMURA Workshop)

          INVITED SPEAKER: KENNETH W. CHURCH

   In association with the 4th International Conference On
        Language Resources and Evaluation - LREC 2004

           Centro Cultural de Belém, Lisbon, Portugal
                         May 25, 2004

                 http://memura2004.di.ubi.pt

********************* CALL FOR PAPERS *********************

This annoucement contains:
  [1] Workshop Description
  [2] Target Audience
  [3] Areas of Interest
  [4] Invited Speaker
  [5] Important dates
  [6] Abstract Submission
  [7] Workshop Chairs
  [8] Program Committee
  [9] Contact

-------------------------------------------------------------------------
[1] Workshop Description:
-------------------------------------------------------------------------

Multiword units (MWUs) include a large range of linguistic 
phenomena, such as phrasal verbs (e.g. "look forward"), 
nominal compounds (e.g. "interior designer"), named entities 
(e.g. "United Nations"), set phrases (e.g. "con carne") or 
compound adverbs (e.g. "by the way"), and they can be 
syntactically and/or semantically idiosyncratic in nature. 
MWUs are used frequently in everyday language, usually to 
express precisely ideas and concepts that cannot be compressed 
into a single word. A considerable amount of research has been 
devoted to this subject, both in terms of theory and practice, 
but despite increasing interest in idiomaticity within 
linguistic research, many questions still remain unanswered. 
The objective of this workshop is to deal with three important 
questions that are of great interest for real-world applications.

1) Comparison of MWU extraction methodologies

Many methodologies have been proposed in order to automatically 
extract or identify MWUs. However, not many efforts have been 
devoted to compare their results. The core differences between 
the methodologies is certainly the main reason why such works 
are so rare. For instance, it is not easy to compare 
language-dependent methodologies as the results depend on the 
efficiency of parameter tuning in the broad sense of its acception 
(i.e. semantic tagging, local specific grammars, lematization,
part-of-speech tagging etc.). Another important problem is the fact
that there is no real agreement between researchers about the
definition of MWUs which would provide the basis for an objective
evaluation. The objective of the workshop is to gather people 
that have recently been working in this area so that new trends 
in comparing MWU extraction methodologies and their evaluation can be
pointed at.

2) Evaluation of the benefits of the integration of MWUs in real-world
applications

It is not yet clear whether MWUs really improve NLP applications. 
It is common sense that Machine Translation is one application that
takes great advantage of MWUs databanks. However, does the same apply 
to applications in Automatic Summarization, Information Retrieval (IR),
Cross-language IR, Information Extraction, Text
Clustering/Classification, Parallel Corpus Alignment? Indeed, could the
identification of MWUs introduce new constraints that are not present in
original texts? Should MWUs be considered as units that should not be
analysable in terms of their components meaning? Or should they be
treated as unanalysable? Should NLP methods work both on isolated 
words and on agregated MWUs? The answers are anything but clear. Here,
the objective of the workshop is to point at successes and failures 
of the integration of MWUs in real-world applications.

3) Comparison of scalable architectures for the extraction and
identification of MWUs

Real-world applications are constrained by variables like processing
time and memory space. However, identifying and extracting MWUs is
usually a computationally heavy process. In recent years, new algorithms
and new technologies have been proposed to introduce MWU treatmement in
large scale applications, thus avoiding previous untractable
implementations. Previous workshops on MWUs have mainly focused on the
unconstrained extraction process. In this workshop, we would like to
focus on the comparison of different factors that can influence the
scalability of the treatment of MWUs in real-world applications, namely
data structures, algorithms, parallel and distributed computing, grid
computing etc. Indeed, as we said earlier, some extraction strategies
may not scale to deal with huge volumes of data.

-------------------------------------------------------------------------
[2] Target Audience:
-------------------------------------------------------------------------

This workshop is intended to bring together NLP researchers working on
all areas of MWUs. The objective is to summarise what has been achieved
in the area of MWU in real-world applications, to establish common
themes between different approaches, and to discuss future trends.

-------------------------------------------------------------------------
[3] Areas of Interest:
-------------------------------------------------------------------------

Abstracts are invited on, but not limited to, the following topics:

    * Automatic, semi-automatic and manual evaluations of MWUs
      extractors
    * Resources for evaluating MWUs extractors
    * Evaluation Standards
    * Cross-language and Cross-domain evaluations of MWUs extractors
    * Comparative evaluation of MWUs extractors
    * Evaluation of the integration of MWUs in NLP applications:
      Summarization, (Cross-language) Information Retrieval, Information
      Extraction, Machine Translation, Text Classification etc.
    * Scalable algorithms, new data structures, Parallel and Distributed
      processing and Grid computing for MWUs extraction and/or
      identification
    * Comparative evaluation of extraction software architectures
    * Role of isolated words and MWUs for a sense-based definition of
      MWUs

Abstracts can cover one or more of these areas.

-------------------------------------------------------------------------
[4] Invited Speaker:
-------------------------------------------------------------------------

Kenneth W. Church (AT&T Labs Research, USA)

-------------------------------------------------------------------------
[5] Important dates:
-------------------------------------------------------------------------

Abstract submission deadline: February 23, 2004
Notification: March 15, 2004
Camera ready papers: April 12, 2004
Workshop: May 25, 2004

-------------------------------------------------------------------------
[6] Abstract Submission:
-------------------------------------------------------------------------

Abstracts should consist of about 1000 words. Abstracts should be
submitted electronically in pdf format only to Gaël Harry Dias
[ddg at di.ubi.pt]. The following URL transforms postscript files to pdf
files (http://www.ps2pdf.com/). The subject line should be "LREC 2004
MEMURA WORKSHOP PAPER SUBMISSION".

Because reviewing is blind, no author information should be included as
part of the abstract (i.e. the names of the authors and references that
could identify the authors). An identification page must be sent in a
separate email with the subject line "LREC 2004 MEMURA WORKSHOP ID PAGE"
and must include title, author(s), keywords, word count and name and
email of the contact author.

Late submissions will not be accepted. Notification of receipt will be
emailed to the contact author shortly after receipt.

-------------------------------------------------------------------------
[7] Workshop Chairs:
-------------------------------------------------------------------------

Gaël Harry Dias (Beira Interior University, Portugal)
José Gabriel Pereira Lopes (New University of Lisbon, Portugal)
Spela Vintar (University of Ljubljana, Slovenia)

-------------------------------------------------------------------------
[8] Program Committee:
-------------------------------------------------------------------------

Timothy Baldwin (Stanford University, United States of America)
Sophia Ananiadou (University of Salford, England)
Didier Bourigault (University of Toulouse, France)
Pascale Fung (University of Science and Technology, Hong Kong)
Mikio Yamamoto (University of Tsukuba, Japan)
Dekang Lin (University of Alberta, Canada)
Aline Villavicencio (University of Cambridge, England)
Heiki Kaalep (University of Tartu, Estonia)
Joaquim da Silva (New University of Lisbon)
Eric Gaussier (Xerox Research Centre Europe, France)
Adeline Nazarenko (University Paris XIII, France)
António Branco (Lisbon University, Portugal)

-------------------------------------------------------------------------
[9] Contact:
-------------------------------------------------------------------------

Contact:

Gaël Harry Dias
Human Language Technology Interest Group
Departamento de Informática
Universidade da Beira Interior
Rua Marquês d'Ávila e Bolama
6201-001 Covilhã
Portugal
email: ddg at di.ubi.pt
Tel: +351 275 319 700
Fax: +351 275 319 732
-- 
---------------------------------------------------------
Gaël Harry Dias, PhD		| Assistant Professor
Human Language Technology Group | [www.di.ubi.pt/~ddg]
Computer Science Department	| [ddg at di.ubi.pt]
Beira Interior University	| [Tel: +351 275 319 700]
6201-001 - Covilhã - PORTUGAL	| [Fax: +351 275 319 732]
---------------------------------------------------------