[ln] Appel: ELECTRA 2005

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Tue Mar 29 08:15:14 UTC 2005

Date: Sat, 26 Mar 2005 14:50:43 -0000 (WET)
From: ddg at di.ubi.pt
Message-ID: <1587. at www.di.ubi.pt>
X-url: http://research.yahoo.com/workshops/electra2005/
X-url: http://www.dcc.ufmg.br/eventos/sigir2005/

[Apologies for Multiple Postings]

============================CALL FOR PAPERS============================

     ELECTRA Workshop on Methodologies and Evaluation of Lexical
            Cohesion Techniques in Real-world Applications
                        (Beyond Bag of Words)

          In association with the 28th Annual International
           ACM SIGIR Conference on Research and Development
                in Information Retrieval (SIGIR 2005)

                  Sponsored by Yahoo! Research Labs

                   Pestana Bahia, Salvador, Brazil

                            August 19, 2005


============================CALL FOR PAPERS============================


[1] Description
[2] Target Audience
[3] Areas of Interest
[4] Important Dates
[5] Paper Submission
[6] Organising Committee
[7] Program Committee
[8] Contact

[1] Description:

Lexical cohesion can be subdivided into two distinct areas: (1)
lexical associations, that embody a wide spectrum of language
phenomena such as named entities, multiword units, collocations and
word co-occurrences and (2) lexical relations that provide evidence of
the semantic and discourse structure of text through relations between
terms over large distances.

The central goal of this workshop is to bring together researchers in
NLP and IR to discuss the use of lexical cohesion in text
applications, such as document and passage retrieval, question
answering, topic segmentation and text summarization. Indeed, despite
the fact that both communities are working with the same material
(human language), collaboration between them has so far been limited.

In this workshop we are interested in pointing at successes and
failures of the integration of lexical cohesion in real-world IR
applications. On the one hand, lexical cohesion has received much
attention in Information Retrieval research during its more than
30-year old history, but so far with mixed results. On the other hand,
a considerable amount of research has been devoted to this subject,
both in terms of theory and practice, by the Natural Language
Processing community, but with limited evaluation in real-world
applications. It is clear that we are at a point where both
communities should meet in order to discuss related issues. This is
the objective of this workshop.

In particular, we will address two questions that are of great
importance for real-world IR applications.

1) Efficient methodologies for Lexical Cohesion identification

Lexical cohesion has received attention in IR research since its
outset.  We can point to (a) the identification and the use of
multiword units for indexing and search, and (b) the extraction of
long-distance lexical relations for tasks such as passage retrieval,
topic segmentation or text summarization.

On the one hand, the interest in multiword units (or phrases) can be
partially attributed to the fact that phrases typically have a higher
information content and specificity than single words, and therefore
represent the concepts expressed in text more accurately than single

On the other hand, interest in long-distance lexical relations in text
has been motivated in IR research by the realization of the
limitations of most IR models that assume term independence in
text. As a consequence, a number of techniques have been developed to
improve term independence models, such as passage retrieval and query
expansion techniques.

The choice of the methodologies and techniques for these tasks has
always been restricted by the problem of efficiency that is critical
for real-world IR applications. Indeed, real-world IR applications are
constrained by variables such as processing time and memory space.
Identifying and extracting lexical associations and lexical relations
is a computationally intensive process. In recent years new algorithms
and new technologies have been proposed to introduce lexical cohesion
techniques in large scale applications, thus avoiding previous
intractable implementations.

Previous workshops on lexical cohesion have mainly focused on the
unconstrained extraction process. In this workshop, we would like to
focus on the comparison of different factors that can influence the
scalability of the treatment of lexical cohesion in real-world
applications, namely data structures, algorithms, parallel and
distributed computing or grid computing. We would also be interested
in new methodologies for lexical cohesion that may easily scale to
real-world applications based on complexity measurements.

2) Evaluation of the benefits of Lexical Cohesion in IR applications

Contiguous lexical associations have often been used in experimental
IR systems. Different techniques have been studied for this purpose:
(a) statistical methods based on co-occurrence statistics or ngram
language modeling techniques (b) hybrid techniques based on simple
statistics and shallow linguistic techniques such as part-of-speech
tagging and noun-phrase chunking and (c) knowledge-based
techniques. However, the importance of the contribution of phrase
matching has not been systematically quantified.  Moreover, the
evaluation of such techniques is difficult in IR applications, as the
number of environment variables is very large and each system combines
a variety of indexing and matching techniques. Therefore, a more
focused and systematic approach towards analyzing the uses of lexical
associations in IR and their evaluation is needed. This workshop will
provide a framework for such analysis, and will present for discussion
a number of challenging questions regarding the use of lexical
associations in text. In particular we will ask questions such as: How
should multiword units be incorporated into IR models designed for
single terms? What weighting models can be used for them? How should
they be matched against their lexical-syntactic variants in text? How
should we handle non-contiguous lexical associations? How can we avoid
over-weighting a phrase occurrence in a document matching more than
one phrase in the query? These are only few questions of a huge field
of research full of unsolved problems.

In contrast with contiguous lexical units, relations between
non-contiguous lexical units are important building blocks of the
text, forming its lexical cohesion. Indeed, the complete meaning of a
word in text can only be realized when it is interpreted in
combination with the surrounding words, forming lexical cohesive ties
with them. These lexical relations have been used for a number of IR
tasks, for example query expansion, passage retrieval, topic
segmentation and text summarization.  However, most of the techniques
do not use deep semantic or discourse structure information in
identifying such relations, instead relying on their statistical
evidence i.e. their co-occurrence patterns. In fact, very little work
has explored the use of NLP techniques such as lexical chaining or
discourse analysis that make use of semantic and discourse structure
within text to improve the performance of IR applications.  One of the
main objections to the use of such techniques has been the claim that
they are more computationally demanding than statistical co-occurrence
techniques. However, with the development of more efficient algorithms
by the NLP community it will be interesting to further explore the use
of such techniques in IR applications.

As a consequence, we would like to gather people who use lexical
relations in different subfields of IR. Non-trivial questions are
addressed here.  What types of lexical relations prove useful for
different IR tasks? What statistical models are most effective for the
identification of lexical relations for different IR tasks? Can
linguistic techniques for identifying lexical relations in text, such
as lexical chaining or discourse analysis techniques be useful for any
IR tasks? How can contiguous or non-contiguous lexical cohesive
relations be identified in text? How can we reliably evaluate and
compare these techniques?

[2] Target Audience:

This workshop is intended to bring together IR and NLP researchers
working on all areas of information retrieval and using lexical
associations in information retrieval applications. The objective is
to discuss what has been achieved in this area, to establish common
themes between different approaches, and to discuss future research

[3] Areas of Interest:

Papers are invited on, but not limited to, the following topics:

* Efficient Techniques for Lexical Cohesion identification
* Scalable Algorithms for Lexical Cohesion identification
* Lexical Associations and Lexical Relations Resources
* Document Representation and Lexical Associations
* Document Ranking and Lexical Associations
* Single-Term and Phrase Information Retrieval
* Passage Retrieval and Lexical Cohesion
* Query Expansion and Lexical Associations
* Local and Global Context Analysis
* Ontology-based Query Expansion
* Question Answering and Lexical Relations
* Web Search and Lexical Cohesion
* Topic Segmentation and Lexical Cohesion
* Text Summarization and Lexical Cohesion
* Evaluation Standards and Benchmarks
* Qualitative and Quantitative Evaluations

Papers can cover one or more of these areas.

[4] Important dates:

Paper submission deadline: May 15th, 2005
Notification: June 15th, 2005
Camera ready papers: July 1st, 2005
Workshop: August 19th, 2005

[5] Paper Submission:

Papers should follow SIGIR 2005 instructions
(http://www.dcc.ufmg.br/eventos/sigir2005/). Papers should be
submitted electronically in pdf format only to Rosie Jones
[jonesr at yahoo-inc.com]. The following URL transforms postscript files
to pdf files (http://www.ps2pdf.com/). The subject line should be

Because reviewing is blind, no author information should be included
as part of the paper (i.e. the names of the authors and references
that could identify the authors). An identification page must be sent
in a separate email with the subject line "SIGIR 2005 ELECTRA WORKSHOP
ID PAGE" and must include title, author(s), keywords, page number and
name and email of the contact author.

Late submissions will not be accepted. Notification of receipt will
be emailed to the contact author shortly after receipt.

[6] Organising Committee:

Rosie Jones (Yahoo! Inc, United States of America)
Olga Vechtomova (University of Waterloo, Canada)
Gaël Harry Dias (University of Beira Interior, Portugal)

[7] Program Committee:

Brigitte Grau - (LIMSI, France)
Bruce Croft - (University of Massachusetts, USA)
Charlie Clarke - (University of Waterloo, Canada)
Diana Inkpen - (University of Ottawa, Canada)
Dunja Mladenic - (Josef Stephan Institute, Slovenia)
Patrick Pantel - (University of Southern California, USA)
Egidio Terra - (Pontifícia Univ. Católica do Rio Grande do Sul,
Gabriel Lopes - (New University of Lisbon, Portugal)
Graeme Hirst - (University of Toronto, Canada)
Hal Daume - (University of Southern California, USA)
Helena Ahonen-Myka (University of Helsinki, Finland)
Murat Karamuftuoglu - (Bilkent University, Turkey)
Nicola Stokes - (University College Dublin, Ireland)
Peter Turney - (National Research Council Canada, Canada)
Rafael Muñoz - (University of Alicante, Spain)

[8] Contact:

Rosie Jones
Yahoo! Overture Matching Sciences
Yahoo! Inc
74 N. Pasadena Ave, 3F
Pasadena, CA 91103
United States of America
email: jonesr at yahoo-inc.com

Message diffusé par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/
English version          : http://www.biomath.jussieu.fr/LN/LN/
Archives                 : http://listserv.linguistlist.org/archives/ln.html

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  : http://www.atala.org/

More information about the Ln mailing list