[Corpora-List] Call for Participation: Corpus Profiling Workshop at IIiX 2008
Kruschwitz U
udo at essex.ac.uk
Tue Oct 7 14:39:16 UTC 2008
-----------------------------------
CALL FOR PARTICIPATION
-----------------------------------
Corpus Profiling for Information Retrieval and Natural Language
Processing Workshop 2008
18 October 2008
London
http://kmi.open.ac.uk/events/corpus-profiling/index.php
***Please note that there is no on-site registration***
-----------------------------------
PURPOSE
-----------------------------------
We aim to bring together people from different research communities
interested in exploring how corpus characteristics affect the behaviour
of techniques in information retrieval and natural language processing,
and to set out a roadmap for a shared research agenda.
It is well known in NLP and IR that the effectiveness of a technique
depends on both the data on which it is deployed and its match with the
task at hand. In 1973, Spärck-Jones attributed differing degrees of
success at automatic classification to differences in dataset
characteristics. Since Croft and Harper (1979), IR performance has
repeatedly been related to collection size and other features, though no
upper bound has been found.
The importance of data and task dependencies has been highlighted in IR,
anaphora resolution, automatic summarization and recently, in word sense
disambiguation. Many web/enterprise web retrieval systems rely on URL
properties, link graph properties, click streams, and so on, with
performance dependent on the degree to which this evidence is present
and meaningful in a particular corpus.
Systematically exploring features that can be used effectively to
characterise corpora, has been missing from IR/NLP research. This
creates problems with replicability of experimental results and the
development of applications.
The time is right to pursue this dependence systematically to address
topics in tracking the effect of dataset profile on technique
performance. Over the past 15 years, the approaches of several subject
areas have converged with IR, as large corpora and test collections
assume central importance in research methodologies. These areas have
highlighted issues surrounding the role of data.
-----------------------------------
INVITED SPEAKERS
-----------------------------------
Anne De Roeck (The Open University)
Ruslan Mitkov (University of Wolverhampton)
Michael Oakes (University of Sunderland)
Leif Azzopardi, (University of Glasgow)
Nikolaos Nanas (TBC), Centre for Research and Technology - Thessaly
(CERETETH)
-----------------------------------
ACCEPTED PAPERS
-----------------------------------
Automatic Natural Language Style Classification and Transformation
Foaad Khosmood and Robert A. Levinson (University of California, Santa Cruz)
Genre Analysis of Structured E-mails for Corpus Profiling
Malcolm Clark (The Robert Gordon University), Ian Ruthven (University
Strathclyde), Patrik O'Brian Holt (The Robert Gordon University)
Lexical Profiling of Existing Web Directories to Support Fine-grained
Topic-Focused Web Crawling
Mark Greenwood, Goran Nenadic (University of Manchester)
Building a document genre corpus: a profile of the KRYS I corpus
Vera F. Berninger, Yunhyong Kim and Seamus Ross (University of Glasgow)
Distributional Lexical Semantics for Stop Lists
Neil Cooke, Lee Gillam (University of Surrey)
-----------------------------------
YOUR CONTRIBUTION
-----------------------------------
We are looking forward to a very productive workshop with as much
interaction as possible. As stated in the workshop aims we are to set
out a roadmap for a shared research agenda. To do this most effectively
we are asking participants to provide some input stating their views on
corpus profiling for NLP and IR. Ideally this would be a short
paragraph, suggestions for discussion or even a simple statement that
should be submitted to the workshop organizers before the workshop. Any
input is most welcome!
-----------------------------------
REGISTRATION
-----------------------------------
The registration fee will be £80. Registration is through the IIiX
registration site: http://irsg.bcs.org/iiix2008/registration.php
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list