[Corpora-List] Call for Papers: ELECTRA 2005

Sat Mar 26 14:37:39 UTC 2005

[Apologies for Multiple Postings]

============================CALL FOR PAPERS============================

     ELECTRA Workshop on Methodologies and Evaluation of Lexical
           Cohesion Techniques in Real-world Applications
                       (Beyond Bag of Words)

           In association with the 28th Annual International
            ACM SIGIR Conference on Research and Development
                 in Information Retrieval (SIGIR 2005)

                   Sponsored by Yahoo! Research Labs

                    Pestana Bahia, Salvador, Brazil

                            August 19, 2005

            http://research.yahoo.com/workshops/electra2005/

============================CALL FOR PAPERS============================

GUIDELINES:

[1] Description
[2] Target Audience
[3] Areas of Interest
[4] Important Dates
[5] Paper Submission
[6] Organising Committee
[7] Program Committee
[8] Contact

----------------
[1] Description:
----------------

Lexical cohesion can be subdivided into two distinct areas: (1) lexical
associations, that embody a wide spectrum of language phenomena such as
named entities, multiword units, collocations and word co-occurrences
and (2) lexical relations that provide evidence of the semantic and
discourse structure of text through relations between terms over large
distances.

The central goal of this workshop is to bring together researchers in NLP
and IR to discuss the use of lexical cohesion in text applications, such
as document and passage retrieval, question answering, topic segmentation
and text summarization. Indeed, despite the fact that both communities are
working with the same material (human language), collaboration between
them has so far been limited.

In this workshop we are interested in pointing at successes and failures
of the integration of lexical cohesion in real-world IR applications. On
the one hand, lexical cohesion has received much attention in Information
Retrieval research during its more than 30-year old history, but so far
with mixed results. On the other hand, a considerable amount of research
has been devoted to this subject, both in terms of theory and practice, by
the Natural Language Processing community, but with limited evaluation in
real-world applications. It is clear that we are at a point where both
communities should meet in order to discuss related issues. This is the
objective of this workshop.

In particular, we will address two questions that are of great importance
for real-world IR applications.

1) Efficient methodologies for Lexical Cohesion identification

Lexical cohesion has received attention in IR research since its outset.
We can point to (a) the identification and the use of multiword
units for indexing and search, and (b) the extraction of long-distance
lexical relations for tasks such as passage retrieval, topic segmentation
or text summarization.

On the one hand, the interest in multiword units (or phrases) can be
partially attributed to the fact that phrases typically have a higher
information content and specificity than single words, and therefore
represent the concepts expressed in text more accurately than single terms.

On the other hand, interest in long-distance lexical relations in
text has been motivated in IR research by the realization of the limitations
of most IR models that assume term independence in text. As a consequence,
a number of techniques have been developed to improve term independence
models, such as passage retrieval and query expansion techniques.

The choice of the methodologies and techniques for these tasks has
always been restricted by the problem of efficiency that is critical
for real-world IR applications. Indeed, real-world IR applications are
constrained by variables such as processing time and memory space.
Identifying and extracting lexical associations and lexical relations
is a computationally intensive process. In recent years new algorithms
and new technologies have been proposed to introduce lexical cohesion
techniques in large scale applications, thus avoiding previous intractable
implementations.

Previous workshops on lexical cohesion have mainly focused on the
unconstrained extraction process. In this workshop, we would like to focus
on the comparison of different factors that can influence the scalability
of the treatment of lexical cohesion in real-world applications, namely
data structures, algorithms, parallel and distributed computing or grid
computing. We would also be interested in new methodologies for lexical
cohesion that may easily scale to real-world applications based on complexity
measurements.

2) Evaluation of the benefits of Lexical Cohesion in IR applications

Contiguous lexical associations have often been used in experimental IR
systems. Different techniques have been studied for this purpose:
(a) statistical methods based on co-occurrence statistics or ngram language
modeling techniques (b) hybrid techniques based on simple statistics and
shallow linguistic techniques such as part-of-speech tagging and noun-phrase
chunking and (c) knowledge-based techniques. However, the importance of the
contribution of phrase matching has not been systematically quantified.
Moreover, the evaluation of such techniques is difficult in IR applications,
as the number of environment variables is very large and each system combines
a variety of indexing and matching techniques. Therefore, a more focused
and systematic approach towards analyzing the uses of lexical associations
in IR and their evaluation is needed. This workshop will provide a framework
for such analysis, and will present for discussion a number of challenging
questions regarding the use of lexical associations in text. In particular
we will ask questions such as: How should multiword units be incorporated
into IR models designed for single terms? What weighting models can be used
for them? How should they be matched against their lexical-syntactic variants
in text? How should we handle non-contiguous lexical associations? How can we
avoid over-weighting a phrase occurrence in a document matching more than one
phrase in the query? These are only few questions of a huge field of research
full of unsolved problems.

In contrast with contiguous lexical units, relations between
non-contiguous lexical units are important building blocks of the text,
forming its lexical cohesion. Indeed, the complete meaning of a word
in text can only be realized when it is interpreted in combination with
the surrounding words, forming lexical cohesive ties with them. These lexical
relations have been used for a number of IR tasks, for example query
expansion, passage retrieval, topic segmentation and text summarization.
However, most of the techniques do not use deep semantic or discourse
structure information in identifying such relations, instead relying
on their statistical evidence i.e. their co-occurrence patterns. In fact,
very little work has explored the use of NLP techniques such as lexical
chaining or discourse analysis that make use of semantic and discourse
structure within text to improve the performance of IR applications.
One of the main objections to the use of such techniques has been the
claim that they are more computationally demanding than statistical
co-occurrence techniques. However, with the development of more efficient
algorithms by the NLP community it will be interesting to
further explore the use of such techniques in IR applications.

As a consequence, we would like to gather people who use lexical relations
in different subfields of IR. Non-trivial questions are addressed here.
What types of lexical relations prove useful for different IR tasks? What
statistical models are most effective for the identification of lexical
relations for different IR tasks? Can linguistic techniques for identifying
lexical relations in text, such as lexical chaining or discourse analysis
techniques be useful for any IR tasks? How can contiguous or non-contiguous
lexical cohesive relations be identified in text? How can we reliably
evaluate and compare these techniques?

--------------------
[2] Target Audience:
--------------------

This workshop is intended to bring together IR and NLP researchers
working on all areas of information retrieval and using lexical
associations in information retrieval applications. The objective is to
discuss what has been achieved in this area, to establish common
themes between different approaches, and to discuss future research
directions.

----------------------
[3] Areas of Interest:
----------------------

Papers are invited on, but not limited to, the following topics:

* Efficient Techniques for Lexical Cohesion identification
* Scalable Algorithms for Lexical Cohesion identification
* Lexical Associations and Lexical Relations Resources
* Document Representation and Lexical Associations
* Document Ranking and Lexical Associations
* Single-Term and Phrase Information Retrieval
* Passage Retrieval and Lexical Cohesion
* Query Expansion and Lexical Associations
* Local and Global Context Analysis
* Ontology-based Query Expansion
* Question Answering and Lexical Relations
* Web Search and Lexical Cohesion
* Topic Segmentation and Lexical Cohesion
* Text Summarization and Lexical Cohesion
* Evaluation Standards and Benchmarks
* Qualitative and Quantitative Evaluations

Papers can cover one or more of these areas.

--------------------
[4] Important dates:
--------------------

Paper submission deadline: May 15th, 2005
Notification: June 15th, 2005
Camera ready papers: July 1st, 2005
Workshop: August 19th, 2005

---------------------
[5] Paper Submission:
---------------------

Papers should follow SIGIR 2005 instructions
(http://www.dcc.ufmg.br/eventos/sigir2005/). Papers should
be submitted electronically in pdf format only to Rosie Jones
[jonesr at yahoo-inc.com]. The following URL transforms
postscript files to pdf files (http://www.ps2pdf.com/). The subject
line should be "SIGIR 2005 ELECTRA WORKSHOP PAPER SUBMISSION".

Because reviewing is blind, no author information should be included
as part of the paper (i.e. the names of the authors and references
that could identify the authors). An identification page must be sent
in a separate email with the subject line
"SIGIR 2005 ELECTRA WORKSHOP ID PAGE" and must include title, author(s),
keywords, page number and name and email of the contact author.

Late submissions will not be accepted. Notification of receipt will
be emailed to the contact author shortly after receipt.

-------------------------
[6] Organising Committee:
-------------------------

Rosie Jones (Yahoo! Inc, United States of America)
Olga Vechtomova (University of Waterloo, Canada)
Gaël Harry Dias (University of Beira Interior, Portugal)

----------------------
[7] Program Committee:
----------------------

Brigitte Grau - (LIMSI, France)
Bruce Croft - (University of Massachusetts, USA)
Charlie Clarke - (University of Waterloo, Canada)
Diana Inkpen - (University of Ottawa, Canada)
Dunja Mladenic - (Josef Stephan Institute, Slovenia)
Patrick Pantel - (University of Southern California, USA)
Egidio Terra - (Pontifícia Univ. Católica do Rio Grande do Sul, Brazil)
Gabriel Lopes - (New University of Lisbon, Portugal)
Graeme Hirst - (University of Toronto, Canada)
Hal Daume - (University of Southern California, USA)
Helena Ahonen-Myka (University of Helsinki, Finland)
Murat Karamuftuoglu - (Bilkent University, Turkey)
Nicola Stokes - (University College Dublin, Ireland)
Peter Turney - (National Research Council Canada, Canada)
Rafael Muñoz - (University of Alicante, Spain)

------------
[8] Contact:
------------

Rosie Jones
Yahoo! Overture Matching Sciences
Yahoo! Inc
74 N. Pasadena Ave, 3F
Pasadena, CA 91103
United States of America
email: jonesr at yahoo-inc.com