<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

  <title></title>

</head>

<body text="#000000" bgcolor="#ffffff">

<div class="moz-text-flowed"

 style="font-family: -moz-fixed; font-size: 13px;" lang="x-western"><apologies

for multiple postings>

<br>

<br>

                    #########################<br>

                    # Deadline April 30th   #

<br>

                    #########################

<br>

<br>

<br>

<br>

                        Call for Papers <br>

                    COLING/ACL 2006 Workshop

<br>

<br>

            INFORMATION EXTRACTION BEYOND THE DOCUMENT

<br>

               22nd July 2006, Sydney, Australia

<br>

<br>

<br>

                       Organisers:         <br>

         Mary Elaine Califf (Illinois State University)

<br>

         Mark A. Greenwood (University of Sheffield)

<br>

         Mark Stevenson (University of Sheffield)

<br>

         Roman Yangarber (University of Helsinki)

<br>

<br>

<br>

Traditional approaches to the development and evaluation of

<br>

Information Extraction (IE) systems have relied on relatively small

<br>

collections of up to a few hundred documents tagged with detailed

<br>

semantic annotations.  While this paradigm has enabled rapid advances

<br>

in IE technology, it remains constrained by a dependence on annotated

<br>

documents and does not make use of the information available in large

<br>

corpora.  Alternative approaches, which make use of large text

<br>

collections and inter-document information, are now beginning to

<br>

emerge -- as evidenced by a parallel emergence of interest in learning

<br>

from unlabelled data in AI in general.  For example, some systems

<br>

learn extraction patterns by exploiting information about their

<br>

distribution across corpora; others exploit the redundancy of the

<br>

internet by assuming that facts with multiple mentions are more

<br>

reliable.  These approaches require large amounts of unannotated text,

<br>

which is generally easy to obtain, and employ unsupervised or

<br>

minimally supervised learning algorithms, as well as related

<br>

techniques such as co-training and active learning.  These alternative

<br>

approaches are complementary to the established IE paradigm based on

<br>

supervised training, and are now forming a cohesive emergent trend in

<br>

recent research. They will constitute the focus of this workshop.

<br>

<br>

There are several advantages to employing large text collections for

IE.

<br>

They provide enormous amounts of training data, albeit mostly

<br>

unannotated.  Facts can be extracted from, or verified across, multiple

<br>

documents.  Large text collections often contain vast amounts of

<br>

redundancy in  the form of multiple references to or mentions of

closely

<br>

related facts.   Redundancy can be exploited in the IE setting to

<br>

identify trends and patterns  within the text, e.g., by means of Data

<br>

Mining techniques.

<br>

<br>

This workshop invites new, original work on learning extraction rules

<br>

or identifying facts across document boundaries while exploiting

<br>

sizable amounts of unlabelled text in the training stage, in the

<br>

extraction stage, or both. The workshop hopes to bring together

<br>

researchers from the various related areas, such as Information

<br>

Extraction, Data Mining, biomedical text processing, Question

<br>

Answering, Information Retrieval, Machine Learning, identification of

<br>

lexical relations (hyponymy, meronymy etc.), multi-lingual text

<br>

processing and the Semantic Web.  This workshop solicits papers on all

<br>

relevant aspects, including algorithms, techniques and applications.

<br>

<br>

Topics of particular interest include:

<br>

- Extraction of information described across documents

<br>

- Integration and mutual benefits of IE and Data Mining

<br>

- Extraction of information from massive corpora (such as the Internet)

<br>

- Mutual applications and interaction between Information Extraction

<br>

 and the Semantic Web

<br>

- Verification of information using external sources

<br>

- Exploiting cross-lingual and multi-lingual approaches for improving

<br>

 performance in IE

<br>

<br>

<br>

------------------------- <br>

IMPORTANT DATES

<br>

------------------------- <br>

Submission Deadline:         April 30th, 2006

<br>

Notification of acceptance:  May 22nd, 2006

<br>

Camera-ready papers due:     June 2nd, 2006

<br>

<br>

<br>

--------------------------

<br>

SUBMISSION INSTRUCTIONS

<br>

--------------------------

<br>

<br>

Authors are invited to submit original, unpublished work on the topic

<br>

areas of the workshop. Submissions should follow the standard

<br>

two-column formatting instructions for the main COLING/ACL 2006

<br>

conference. Submitted papers should be no longer than eight (8) pages

<br>

in length, including references. We strongly recommend the use of the

<br>

Latex and Microsoft Word style files which will be available on the

<br>

main conference website.

<br>

<br>

As reviewing will be blind, the paper should not include the authors'

<br>

names and affiliations. Furthermore, self-references that reveal the

<br>

author's identity, e.g., "We previously showed (Smith, 1991) ...",

<br>

should be avoided. Instead, use citations such as "Smith previously

<br>

showed (Smith, 1991) ...". <br>

Submission will be electronic. Details will appear on the workshop web

<br>

site (<a class="moz-txt-link-freetext"

 href="http://nlp.shef.ac.uk/result/iebd06">http://nlp.shef.ac.uk/result/iebd06</a>).

<br>

Questions regarding the submission procedure should be directed to

<br>

Mark Greenwood (<a class="moz-txt-link-abbreviated"

 href="mailto:mark@dcs.shef.ac.uk">mark@dcs.shef.ac.uk</a>). <br>

<br>

--------------------------

<br>

WORKSHOP ORGANIZERS

<br>

--------------------------

<br>

<br>

Mary Elaine Califf

<br>

School of Information Technology, Illinois State University

<br>

<br>

Mark A. Greenwood

<br>

Department of Computer Science, University of Sheffield

<br>

<br>

Mark Stevenson

<br>

Department of Computer Science, University of Sheffield

<br>

<br>

Roman Yangarber

<br>

Department of Computer Science, University of Helsinki

<br>

<br>

<br>

--------------------------

<br>

PROGRAM COMMITTEE

<br>

--------------------------

<br>

Markus Ackermann        (University of Leipzig)

<br>

Amit Bagga              (AskJeeves)

<br>

Roberto Basili          (University of Rome, Tor Vergata)

<br>

Antal van den Bosch     (Tilburg Uniersity)

<br>

Neus Catala             (Universitat Politecnica de Catalunya)

<br>

Walter Daelemans        (University of Antwerp)

<br>

Jenny Rose Finkel       (Stanford University)

<br>

Robert Gaizauskas       (University of Sheffield)

<br>

Ralph Grishman          (NYU)

<br>

Takaaki Hasegawa        (NTT)

<br>

Heng Ji                 (NYU)

<br>

Nick Kushmerick         (University College Dublin, Ireland)

<br>

Alberto Lavelli         (ITC-IRST, Italy)

<br>

Gideon Mann             (John Hopkin's University)

<br>

Ion Muslea              (Language Weaver Inc.)

<br>

Chikashi Nobata         (Sharp, Japan)

<br>

Ellen Riloff            (University of Utah)

<br>

Tony Rose               (Cognia Ltd.)

<br>

Stephen Soderland       (University of Washington)

<br>

Kiyotaka Uchimoto       (CRL, Japan)

<br>

Yorick Wilks            (University of Sheffield)

<br>

<br>

<br>

<br>

<br>

<br>

<br>

</div>

</body>

</html>