19.3084, Confs: Computational Linguistics/UK

Sat Oct 11 15:26:21 UTC 2008

LINGUIST List: Vol-19-3084. Sat Oct 11 2008. ISSN: 1068 - 4875.

Subject: 19.3084, Confs: Computational Linguistics/UK

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Stephanie Morse <morse at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 07-Oct-2008
From: Udo Kruschwitz < udo at essex.ac.uk >
Subject: Corpus Profiling Workshop at IIiX 2008

-------------------------Message 1 ---------------------------------- 
Date: Sat, 11 Oct 2008 11:25:00
From: Udo Kruschwitz [udo at essex.ac.uk]
Subject:  Corpus Profiling Workshop at IIiX 2008

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-3084.html&submissionid=192872&topicid=4&msgnumber=1

Corpus Profiling Workshop at IIiX 2008 

Date: 18-Oct-2008 - 18-Oct-2008 
Location: London, United Kingdom 
Contact: Udo Kruschwitz 
Contact Email: udo at essex.ac.uk 
Meeting URL: http://kmi.open.ac.uk/events/corpus-profiling/index.php 

Linguistic Field(s): Computational Linguistics 

Meeting Description: 

Corpus Profiling for Information Retrieval and Natural Language Processing
Workshop 2008

We aim to bring together people from different research communities interested
in exploring how corpus characteristics affect the behaviour of techniques in
information retrieval and natural language processing, and to set out a roadmap
for a shared research agenda.

It is well known in NLP and IR that the effectiveness of a technique depends on
both the data on which it is deployed and its match with the task at hand. In
1973, Spärck-Jones attributed differing degrees of success at automatic
classification to differences in dataset characteristics. Since Croft and Harper
(1979), IR performance has repeatedly been related to collection size and other
features, though no upper bound has been found.

The importance of data and task dependencies has been highlighted in IR,
anaphora resolution, automatic summarization and recently, in word sense
disambiguation. Many web/enterprise web retrieval systems rely on URL
properties, link graph properties, click streams, and so on, with performance
dependent on the degree to which this evidence is present and meaningful in a
particular corpus.

Systematically exploring features that can be used effectively to characterise
corpora, has been missing from IR/NLP research. This creates problems with
replicability of experimental results and the development of applications.

The time is right to pursue this dependence systematically to address topics in
tracking the effect of dataset profile on technique performance. Over the past
15 years, the approaches of several subject areas have converged with IR, as
large corpora and test collections assume central importance in research
methodologies. These areas have highlighted issues surrounding the role of data. 

Call for Participation

Corpus Profiling for Information Retrieval and Natural Language Processing
Workshop 2008
18 October 2008
London
http://kmi.open.ac.uk/events/corpus-profiling/index.php

***Please note that there is no on-site registration***

Invited Speakers

Anne De Roeck (The Open University)
Ruslan Mitkov (University of Wolverhampton)
Michael Oakes (University of Sunderland)
Leif Azzopardi, (University of Glasgow)
Nikolaos Nanas (TBC), Centre for Research and Technology - Thessaly (CERETETH)

Accepted Papers

Automatic Natural Language Style Classification and Transformation
Foaad Khosmood and Robert A. Levinson (University of California, Santa Cruz)

Genre Analysis of Structured E-mails for Corpus Profiling
Malcolm Clark (The Robert Gordon University), Ian Ruthven (University
Strathclyde), Patrik O'Brian Holt (The Robert Gordon University)

Lexical Profiling of Existing Web Directories to Support Fine-grained
Topic-Focused Web Crawling
Mark Greenwood, Goran Nenadic (University of Manchester)

Building a Document Genre Corpus: A Profile of the KRYS I Corpus
Vera F. Berninger, Yunhyong Kim and Seamus Ross (University of Glasgow)

Distributional Lexical Semantics for Stop Lists
Neil Cooke, Lee Gillam (University of Surrey)

Your Contribution

We are looking forward to a very productive workshop with as much interaction as
possible. As stated in the workshop aims we are  to set out a roadmap for a
shared research agenda. To do this most effectively we are asking participants
to provide some input stating their views on corpus profiling for NLP and IR.
Ideally this would be a short paragraph, suggestions for discussion or even a
simple statement that should be submitted to the workshop organizers before the
workshop. Any input is most welcome!

Registration

The registration fee will be £80. Registration is through the IIiX registration
site: http://irsg.bcs.org/iiix2008/registration.php

-----------------------------------------------------------
LINGUIST List: Vol-19-3084