[Corpora-List] Second CFP: COLING/ACL workshop "Information Extraction Beyond the Document"
Mark Stevenson
M.Stevenson at dcs.shef.ac.uk
Mon Mar 13 10:14:14 UTC 2006
<apologies for multiple postings>
Call for Papers
COLING/ACL 2006 Workshop
INFORMATION EXTRACTION BEYOND THE DOCUMENT
22nd July 2006, Sydney, Australia
Organisers:
Mary Elaine Califf
(Illinois State University)
Mark A. Greenwood
(University of Sheffield)
Mark Stevenson
(University of Sheffield)
Roman Yangarber
(University of Helsinki)
Traditional approaches to the development and evaluation of
Information Extraction (IE) systems have relied on relatively small
collections of up to a few hundred documents tagged with detailed
semantic annotations. While this paradigm has enabled rapid advances
in IE technology, it remains constrained by a dependence on annotated
documents and does not make use of the information available in large
corpora. Alternative approaches, which make use of large text
collections and inter-document information, are now beginning to
emerge -- as evidenced by a parallel emergence of interest in learning
>>From unlabelled data in AI in general. For example, some systems
learn extraction patterns by exploiting information about their
distribution across corpora; others exploit the redundancy of the
internet by assuming that facts with multiple mentions are more
reliable. These approaches require large amounts of unannotated text,
which is generally easy to obtain, and employ unsupervised or
minimally supervised learning algorithms, as well as related
techniques such as co-training and active learning. These alternative
approaches are complementary to the established IE paradigm based on
supervised training, and are now forming a cohesive emergent trend in
recent research. They will constitute the focus of this workshop.
There are several advantages to employing large text collections for
IE. They provide enormous amounts of training data, albeit mostly
unannotated. Facts can be extracted from, or verified across,
multiple documents. Large text collections often contain vast amounts of
redundancy in the form of multiple references to or mentions of
closely related facts. Redundancy can be exploited in the IE setting to
identify trends and patterns within the text, e.g., by means of Data
Mining techniques.
This workshop invites new, original work on learning extraction rules
or identifying facts across document boundaries while exploiting
sizable amounts of unlabelled text in the training stage, in the
extraction stage, or both. The workshop hopes to bring together
researchers from the various related areas, such as Information
Extraction, Data Mining, biomedical text processing, Question
Answering, Information Retrieval, Machine Learning, identification of
lexical relations (hyponymy, meronymy etc.), multi-lingual text
processing and the Semantic Web. This workshop solicits papers on all
relevant aspects, including algorithms, techniques and applications.
Topics of particular interest include:
- Extraction of information described across documents
- Integration and mutual benefits of IE and Data Mining
- Extraction of information from massive corpora (such as the
Internet)
- Mutual applications and interaction between Information Extraction
and the Semantic Web
- Verification of information using external sources
- Exploiting cross-lingual and multi-lingual approaches for improving
performance in IE
-------------------------
IMPORTANT DATES
-------------------------
Submission Deadline: March 31st, 2006
Notification of acceptance: May 12th, 2006
Camera-ready papers due: May 29th, 2006
--------------------------
SUBMISSION INSTRUCTIONS
--------------------------
Authors are invited to submit original, unpublished work on the topic
areas of the workshop. Submissions should follow the standard
two-column formatting instructions for the main COLING/ACL 2006
conference. Submitted papers should be no longer than eight (8) pages
in length, including references. We strongly recommend the use of the
Latex and Microsoft Word style files which will be available on the
main conference website.
As reviewing will be blind, the paper should not include the authors'
names and affiliations. Furthermore, self-references that reveal the
author's identity, e.g., "We previously showed (Smith, 1991) ...",
should be avoided. Instead, use citations such as "Smith previously
showed (Smith, 1991) ...".
Submission will be electronic. Details will appear on the workshop web
site (http://nlp.shef.ac.uk/result/iebd06).
Questions regarding the submission procedure should be directed to
Mark Greenwood (mark at dcs.shef.ac.uk).
--------------------------
WORKSHOP ORGANIZERS
--------------------------
Mary Elaine Califf
School of Information Technology, Illinois State University
Mark A. Greenwood
Department of Computer Science, University of Sheffield
Mark Stevenson
Department of Computer Science, University of Sheffield
Roman Yangarber
Department of Computer Science, University of Helsinki
--------------------------
PROGRAM COMMITTEE
--------------------------
Markus Ackermann (University of Leipzig)
Amit Bagga (AskJeeves)
Roberto Basili (University of Rome, Tor Vergata)
Antal van den Bosch (Tilburg Uniersity)
Neus Catala (Universitat Polithcnica de Catalunya)
Walter Daelemans (University of Antwerp)
Jenny Rose Finkel (Stanford University)
Robert Gaizauskas (University of Sheffield)
Ralph Grishman (NYU)
Takaaki Hasegawa (NTT)
Heng Ji (NYU)
Nick Kushmerick (University College Dublin, Ireland)
Alberto Lavelli (ITK-IRST, Italy)
Gideon Mann (John Hopkin's University)
Ion Muslea (Language Weaver Inc.)
Chikashi Nobata (Sharp, Japan)
Ellen Riloff (University of Utah)
Tony Rose (Cognia Ltd.)
Stephen Soderland (University of Washington)
Kiyotaka Uchimoto (CRL, Japan)
Yorick Wilks (University of Sheffield)
More information about the Corpora
mailing list