<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
<div class="moz-text-flowed"
style="font-family: -moz-fixed; font-size: 13px;" lang="x-western"><apologies
for multiple postings>
<br>
<br>
#########################<br>
# Deadline April 30th #
<br>
#########################
<br>
<br>
<br>
<br>
Call for Papers <br>
COLING/ACL 2006 Workshop
<br>
<br>
INFORMATION EXTRACTION BEYOND THE DOCUMENT
<br>
22nd July 2006, Sydney, Australia
<br>
<br>
<br>
Organisers: <br>
Mary Elaine Califf (Illinois State University)
<br>
Mark A. Greenwood (University of Sheffield)
<br>
Mark Stevenson (University of Sheffield)
<br>
Roman Yangarber (University of Helsinki)
<br>
<br>
<br>
Traditional approaches to the development and evaluation of
<br>
Information Extraction (IE) systems have relied on relatively small
<br>
collections of up to a few hundred documents tagged with detailed
<br>
semantic annotations. While this paradigm has enabled rapid advances
<br>
in IE technology, it remains constrained by a dependence on annotated
<br>
documents and does not make use of the information available in large
<br>
corpora. Alternative approaches, which make use of large text
<br>
collections and inter-document information, are now beginning to
<br>
emerge -- as evidenced by a parallel emergence of interest in learning
<br>
from unlabelled data in AI in general. For example, some systems
<br>
learn extraction patterns by exploiting information about their
<br>
distribution across corpora; others exploit the redundancy of the
<br>
internet by assuming that facts with multiple mentions are more
<br>
reliable. These approaches require large amounts of unannotated text,
<br>
which is generally easy to obtain, and employ unsupervised or
<br>
minimally supervised learning algorithms, as well as related
<br>
techniques such as co-training and active learning. These alternative
<br>
approaches are complementary to the established IE paradigm based on
<br>
supervised training, and are now forming a cohesive emergent trend in
<br>
recent research. They will constitute the focus of this workshop.
<br>
<br>
There are several advantages to employing large text collections for
IE.
<br>
They provide enormous amounts of training data, albeit mostly
<br>
unannotated. Facts can be extracted from, or verified across, multiple
<br>
documents. Large text collections often contain vast amounts of
<br>
redundancy in the form of multiple references to or mentions of
closely
<br>
related facts. Redundancy can be exploited in the IE setting to
<br>
identify trends and patterns within the text, e.g., by means of Data
<br>
Mining techniques.
<br>
<br>
This workshop invites new, original work on learning extraction rules
<br>
or identifying facts across document boundaries while exploiting
<br>
sizable amounts of unlabelled text in the training stage, in the
<br>
extraction stage, or both. The workshop hopes to bring together
<br>
researchers from the various related areas, such as Information
<br>
Extraction, Data Mining, biomedical text processing, Question
<br>
Answering, Information Retrieval, Machine Learning, identification of
<br>
lexical relations (hyponymy, meronymy etc.), multi-lingual text
<br>
processing and the Semantic Web. This workshop solicits papers on all
<br>
relevant aspects, including algorithms, techniques and applications.
<br>
<br>
Topics of particular interest include:
<br>
- Extraction of information described across documents
<br>
- Integration and mutual benefits of IE and Data Mining
<br>
- Extraction of information from massive corpora (such as the Internet)
<br>
- Mutual applications and interaction between Information Extraction
<br>
and the Semantic Web
<br>
- Verification of information using external sources
<br>
- Exploiting cross-lingual and multi-lingual approaches for improving
<br>
performance in IE
<br>
<br>
<br>
------------------------- <br>
IMPORTANT DATES
<br>
------------------------- <br>
Submission Deadline: April 30th, 2006
<br>
Notification of acceptance: May 22nd, 2006
<br>
Camera-ready papers due: June 2nd, 2006
<br>
<br>
<br>
--------------------------
<br>
SUBMISSION INSTRUCTIONS
<br>
--------------------------
<br>
<br>
Authors are invited to submit original, unpublished work on the topic
<br>
areas of the workshop. Submissions should follow the standard
<br>
two-column formatting instructions for the main COLING/ACL 2006
<br>
conference. Submitted papers should be no longer than eight (8) pages
<br>
in length, including references. We strongly recommend the use of the
<br>
Latex and Microsoft Word style files which will be available on the
<br>
main conference website.
<br>
<br>
As reviewing will be blind, the paper should not include the authors'
<br>
names and affiliations. Furthermore, self-references that reveal the
<br>
author's identity, e.g., "We previously showed (Smith, 1991) ...",
<br>
should be avoided. Instead, use citations such as "Smith previously
<br>
showed (Smith, 1991) ...". <br>
Submission will be electronic. Details will appear on the workshop web
<br>
site (<a class="moz-txt-link-freetext"
href="http://nlp.shef.ac.uk/result/iebd06">http://nlp.shef.ac.uk/result/iebd06</a>).
<br>
Questions regarding the submission procedure should be directed to
<br>
Mark Greenwood (<a class="moz-txt-link-abbreviated"
href="mailto:mark@dcs.shef.ac.uk">mark@dcs.shef.ac.uk</a>). <br>
<br>
--------------------------
<br>
WORKSHOP ORGANIZERS
<br>
--------------------------
<br>
<br>
Mary Elaine Califf
<br>
School of Information Technology, Illinois State University
<br>
<br>
Mark A. Greenwood
<br>
Department of Computer Science, University of Sheffield
<br>
<br>
Mark Stevenson
<br>
Department of Computer Science, University of Sheffield
<br>
<br>
Roman Yangarber
<br>
Department of Computer Science, University of Helsinki
<br>
<br>
<br>
--------------------------
<br>
PROGRAM COMMITTEE
<br>
--------------------------
<br>
Markus Ackermann (University of Leipzig)
<br>
Amit Bagga (AskJeeves)
<br>
Roberto Basili (University of Rome, Tor Vergata)
<br>
Antal van den Bosch (Tilburg Uniersity)
<br>
Neus Catala (Universitat Politecnica de Catalunya)
<br>
Walter Daelemans (University of Antwerp)
<br>
Jenny Rose Finkel (Stanford University)
<br>
Robert Gaizauskas (University of Sheffield)
<br>
Ralph Grishman (NYU)
<br>
Takaaki Hasegawa (NTT)
<br>
Heng Ji (NYU)
<br>
Nick Kushmerick (University College Dublin, Ireland)
<br>
Alberto Lavelli (ITC-IRST, Italy)
<br>
Gideon Mann (John Hopkin's University)
<br>
Ion Muslea (Language Weaver Inc.)
<br>
Chikashi Nobata (Sharp, Japan)
<br>
Ellen Riloff (University of Utah)
<br>
Tony Rose (Cognia Ltd.)
<br>
Stephen Soderland (University of Washington)
<br>
Kiyotaka Uchimoto (CRL, Japan)
<br>
Yorick Wilks (University of Sheffield)
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</div>
</body>
</html>