[Corpora-List] CfP: Corpus Profiling Workshop at IIiX 2008 (DEADLINE EXTENDED)

Tue Aug 12 15:34:46 UTC 2008

Call for Papers:

Corpus Profiling for Information Retrieval and Natural Language 
Processing Workshop 2008
18 October 2008
London
Submission deadline: EXTENDED to 5 September 2008
http://kmi.open.ac.uk/events/corpus-profiling/index.php

-----------------------------------
PURPOSE
-----------------------------------

We aim to bring together people from different research communities 
interested in exploring how corpus characteristics affect the behaviour 
of techniques in information retrieval and natural language processing, 
and to set out a roadmap for a shared research agenda.

It is well known in NLP and IR that the effectiveness of a technique 
depends on both the data on which it is deployed and its match with the 
task at hand. In 1973, Spärck-Jones attributed differing degrees of 
success at automatic classification to differences in dataset 
characteristics. Since Croft and Harper (1979), IR performance has 
repeatedly been related to collection size and other features, though no 
upper bound has been found.

The importance of data and task dependencies has been highlighted in IR, 
anaphora resolution, automatic summarization and recently, in word sense 
disambiguation. Many web/enterprise web retrieval systems rely on URL 
properties, link graph properties, click streams, and so on, with 
performance dependent on the degree to which this evidence is present 
and meaningful in a particular corpus.

Systematically exploring features that can be used effectively to 
characterise corpora, has been missing from IR/NLP research. This 
creates problems with replicability of experimental results and the 
development of applications.

The time is right to pursue this dependence systematically to address 
topics in tracking the effect of dataset profile on technique 
performance. Over the past 15 years, the approaches of several subject 
areas have converged with IR, as large corpora and test collections 
assume central importance in research methodologies. These areas have 
highlighted issues surrounding the role of data.

-----------------------------------
WORKSHOP FORMAT
-----------------------------------

The workshop will be a day long, in conjunction with the Information 
Interaction in Context (IIiX'2008, http://irsg.bcs.org/iiix2008/). The 
workshop will have three components:

(1)  invited talks in the morning, introducing the background from 
different perspectives

(2) two afternoon sessions, presenting peer-reviewed papers

(3) a panel discussion (panel composed of presenters and the organizers).

-----------------------------------
TOPICS OF INTEREST
-----------------------------------

We welcome original research or position papers. We particularly 
encourage postgraduate students or postdoctoral researchers to submit 
papers. Topics of interest include, but are NOT LIMITED to, the 
following areas:

     * Suitable features to characterise text/language variety, 
capturing known effects on technique performance with respect to a task;

     * Tasks that depend on aspects of corpus profiles, (e.g., the 
positive correlation of QA performance with fact frequency in a corpus);

     * Limitations of context-independent frequency-based measures, and 
exploration of measures that highlight complex dependencies;

     * Tools/techniques for characterising a feature or the extent to 
which it is manifested in a corpus;

     * Evaluation methodologies for testing feature candidates relative 
to task/technique;

     * Learnability of features (cf. meta-level learning for 
classification algorithms).

-----------------------------------
IMPORTANT DATES
-----------------------------------

5 September 2008: Paper submission due (DEADLINE EXTENDED)

20 September 2008: Notification of acceptance/rejection

26 September 2008: Camera-ready due

18 October 2008: Workshop

-----------------------------------
SUBMISSION GUIDELINES
-----------------------------------

Original technical papers, short papers and position papers are all 
welcome. Please ensure that your submission does not exceed 5,000 words 
in length. Use 10 point font size, double column for body text, and 12 
point bold for headings. Please send your submission in PDF to all the 
three organizers (A.Deroeck at open.ac.uk; d.song at open.ac.uk; 
udo at essex.ac.uk) with subject "Corpus Profiling workshop submission".

We will publish the accepted papers electronically through BCS's 
Electronic Workshops in Computing (eWiC), together with the extended 
abstracts of invited talks, a summary of the panel discussion. We will 
seek to pursue the research thread through further workshops at relevant 
conferences. We plan to organize a post-workshop special issue on a 
suitable IR or NLP related journal.

-----------------------------------
PROGRAMME COMMITTEE
-----------------------------------

Anne De Roeck (The Open University)
Udo Kruschwitz (University of Essex)
Ruslan Mitkov (University of Wolverhampton)
Nikolaos Nanas (CERETETH, Greece)
Michael Oakes (University of Sunderland)
Ian Ruthven (University of Strathclyde)
Dawei Song (KMi, The Open University)
Tomek Strzalkowski (SUNY Albany)
Alistair Willis (The Open University)

For further information please visit 
http://kmi.open.ac.uk/events/corpus-profiling/index.php

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora