[Corpora-List] Second CfP : ICML/UAI/COLT 2008 Workshop on Prior Knowledge for Text and Language Processing

Mon Apr 28 15:31:55 UTC 2008

WORKSHOP: PRIOR KNOWLEDGE FOR TEXT AND LANGUAGE PROCESSING

9 July 2008, Helsinki, in conjunction with the ICML/UAI/COLT

CALL FOR PAPERS:  *** NOTE THE EXTENDED DEADLINE ***

Abstract submission deadline: 7 May 2008 (extended from 30 April)
Notification to authors: 22 May 2008 (extended from 15 May)
Final version: 30 June 2008
Workshop: 9 July 2008

Web page: http://prior-knowledge-language-ws.wikidot.com (please monitor 
this page for updates)

CONTEXT: The workshop is part of the Thematic Programme "Leveraging 
Complex Prior Knowledge for Learning" of the PASCAL-2 European Network 
of Excellence starting in March 2008.

GOALS: The aim of the workshop is to present and discuss recent advances 
in machine learning approaches to text and natural language processing 
that capitalize on rich prior knowledge models in these domains.

MOTIVATION: Traditionally, in Machine Learning, a strong focus has been 
put on data-driven methods that assume little a priori knowledge on the 
part of the learning mechanism. Such techniques have proven quite 
effective not only for simple pattern recognition tasks, but also, more 
surprisingly, for such tasks as language modeling in speech recognition 
using basic n-gram models. However, when the structures to be learned 
become more complex, even large training sets become sparse relative to 
the task, and this sparsity can only be mitigated if some prior 
knowledge comes into play to constrain the space of fitted models. We 
currently see a strong emerging trend in the field of machine learning 
for text and language processing to incorporate such prior knowledge for 
instance in language modeling (e.g. through non-parametric Bayesian 
priors) or in document modeling (e.g. through hierarchical graphical 
models). There are complementary attempts in the field of statistical 
computational linguistics (e.g in statistical machine translation) to 
build hybrid systems that do not rely uniquely on corpus data, but also 
exploit some form of a priori grammatical knowledge, bridging the gap 
between purely data-oriented approaches and the traditional purely 
rule-based approaches, that do not rely on automatic corpus training, 
but only indirectly on human observations about linguistic data. The 
domain of text and language processing thus appears as a very promising 
field for studying the interactions between prior knowledge and raw 
training data, and this workshop aims at providing a forum for 
discussing recent theoretical and practical advances in this area.

TOPICS: The workshop aims at presenting a diversity of viewpoints on 
prior knowledge for language and text processing. Discussion of the 
following topics, techniques and issues is encouraged (non-limitative):

     * Prior knowledge for language modeling, parsing, translation

     * Topic modeling for document analysis and retrieval

     * Parametric and non-parametric Bayesian models in NLP

     * Graphical models embodying structural knowledge of texts

     * Complex features/kernels that incorporate linguistic knowledge; 
kernels built from generative models

     * Limitations of purely data-driven learning techniques for text 
and language applications; performance gains due to incorporation of 
prior knowledge

     * Typology of different forms of prior knowledge for NLP (knowledge 
embodied in generative Bayesian models, in MDL models, in ILP/logical 
models, in linguistic features, in representational frameworks, in 
grammatical rules…)

     * Formal principles for combining rule-based and data-based 
approaches to NLP

     * Linguistic science and cognitive models as sources of prior knowledge

FORMAT: The workshop will consist of a mix of submitted papers, invited 
talks, and discussion/panels in which different viewpoints will be 
emphasized.

CALL FOR PAPERS: Researchers interested in presenting their work at the 
workshop should send an email (preferably plain text or pdf attachment) 
to ws_pktlp at xrce.xerox.com with the following information:

TITLE
AUTHORS
ABSTRACT (corresponding to approximately two plain text pages)

Note: In case you experience problem with the above email alias, please 
contact: marc (dot) dymetman (at) xrce (dot) xerox (dot) com

We expect speakers to provide a final version of their paper before end 
of June for inclusion on the workshop home page, and authors will be 
encouraged to read the included papers prior to the meeting. A compiled 
set of papers will be distributed as working notes at the workshop.

DATES:

Abstract submission deadline: 7 May 2008 (extended from 30 April)
Notification to authors: 22 May 2008 (extended from 15 May)
Final version: 30 June 2008
Workshop: 9 July 2008

INVITED PRESENTATIONS AND PANELISTS (partial list):

     * David Blei
     * Pedro Domingos
     * Mark Johnson
     * Dan Melamed
     * Massimiliano Pontil

PROGRAM COMMITTEE

     * Guillaume Bouchard
     * Nicola Cancedda
     * Hal Daumé III
     * Marc Dymetman
     * Tom Griffiths
     * Peter Grünwald
     * Kevin Knight
     * Marc Johnson
     * Yee Whye Teh

ORGANIZERS:

     * Guillaume Bouchard: guillaume (dot) bouchard (at) xrce (dot) 
xerox (dot) com
     * Hal Daumé III: hal (at) cs (dot) utah (dot) edu
     * Marc Dymetman (main contact): marc (dot) dymetman (at) xrce (dot) 
xerox (dot) com
     * Yee Whye Teh: yeewhye (at) gmail (dot) com

-- 
  Hal Daume III --- me AT hal3 DOT name  |  http://www DOT hal3 DOT name
  "Arrest this man, he talks in maths."  |  http://nlpers.blogspot.com

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora