[Corpora-List] CFP: Workshop on Semi-supervised Learning for NLP at NAACL 2009

Mon Oct 13 16:31:38 UTC 2008

================================================
NAACL HLT 2009 Workshop on
Semi-supervised Learning for Natural Language Processing

June 4 or 5, 2009, Boulder, Colorado, USA
http://sites.google.com/site/sslnlp/

Call for Papers
(Submission deadline: March 6, 2009)
================================================

Machine learning, be it supervised or unsupervised, has become an 
indispensable tool for natural language processing (NLP) researchers. 
Highly developed supervised training techniques have led to 
state-of-the-art performance for many NLP tasks and provide foundations 
for deployable NLP systems. Similarly, unsupervised methods, such as 
those based on EM training, have also been influential, with 
applications ranging from grammar induction to bilingual word alignment 
for machine translation.

Unfortunately, given the limited availability of annotated data, and the 
non-trivial cost of obtaining additional annotated data, progress on 
supervised learning often yields diminishing returns. Unsupervised 
learning, on the other hand, is not bound by the same data resource 
limits. However, unsupervised learning is significantly harder than 
supervised learning and, although intriguing, has not been able to 
produce consistently successful results for complex structured 
prediction problems characteristic of NLP.

It is becoming increasingly important to leverage both types of data 
resources, labeled and unlabeled, to achieve the best performance in 
challenging NLP problems. Consequently, interest in semi-supervised 
learning has grown in the NLP community in recent years. Yet, although 
several papers have demonstrated promising results with semi-supervised 
learning for problems such as tagging and parsing, we suspect that good 
results might not be easy to achieve across the board. Many 
semi-supervised learning methods (e.g. transductive SVM, graph-based 
methods) have been originally developed for binary classification 
problems. NLP problems often pose new challenges to these techniques, 
involving more complex structure that can violate many of the underlying 
assumptions.

We believe there is a need to take a step back and investigate why and 
how auxiliary unlabeled data can truly improve training for NLP tasks.

In particular, many open questions remain:

 1. Problem Structure: What are the different classes of NLP problem 
structures (e.g. sequences, trees, N-best lists) and what algorithms are 
best suited for each class? For instance, can graph-based algorithms be 
successfully applied to sequence-to-sequence problems like machine 
translation, or are self-training and feature-based methods the only 
reasonable choices for these problems?

 2. Background Knowledge: What kinds of NLP-specific background 
knowledge can we exploit to aid semi-supervised learning? Recent 
learning paradigms such as constraint-driven learning and prototype 
learning take advantage of our domain knowledge about particular NLP 
tasks; they represent a move away from purely data-agnostic methods and 
are good examples of how linguistic intuition can drive algorithm 
development.

 3. Scalability: NLP data-sets are often large. What are the scalability 
challenges and solutions for applying existing semi-supervised learning 
algorithms to NLP data?

 4. Evaluation and Negative Results: What can we learn from negative 
results? Can we make an educated guess as to when semi-supervised 
learning might outperform supervised or unsupervised learning based on 
what we know about the NLP problem?

 5. To Use or Not To Use: Should semi-supervised learning only be 
employed in low-resource languages/tasks (i.e. little labeled data, much 
unlabeled data), or should we expect gains even in high-resource 
scenarios (i.e. expecting semi-supervised learning to improve on a 
supervised system that is already more than 95% accurate)?

This workshop aims to bring together researchers dedicated to making 
semi-supervised learning work for NLP problems. Our goal is to help 
build a community of researchers and foster deep discussions about 
insights, speculations, and results (both positive and negative) that 
may otherwise not appear in a technical paper at a major conference. We 
welcome submissions that address any of the above questions or other 
relevant issues, and especially encourage authors to provide a deep 
analysis of data and results. Papers will be limited to 8 pages and will 
be selected based on quality and relevance to workshop goals.

IMPORTANT DATES:
March 6, 2009: Submission deadline
March 30, 2009: Notification of acceptance
April 12, 2009: Camera-ready copies due
June 4 or 5, 2009: Workshop held in conjunction with NAACL HLT (exact 
date to be announced)

PROGRAM COMMITTEE:
Steven Abney (University of Michigan, USA)
Yasemin Altun (Max Planck Institute for Biological Cybernetics, Germany)
Tim Baldwin (University of Melbourne, Australia)
Shane Bergsma (University of Alberta, Canada)
Antal van den Bosch (Tilburg University, The Netherlands)
John Blitzer (UC Berkeley, USA)
Ming-Wei Chang (UIUC, USA)
Walter Daelemans (University of Antwerp, Belgium)
Hal Daume III (University of Utah, USA)
Kevin Gimpel (Carnegie Mellon University, USA)
Andrew Goldberg (University of Wisconsin, USA)
Liang Huang (Google Research, USA)
Rie Johnson [formerly, Ando] (RJ Research Consulting)
Katrin Kirchhoff (University of Washington, USA)
Percy Liang (UC Berkeley, USA)
Gary Geunbae Lee (POSTECH, Korea)
Gina-Anne Levow (University of Chicago, USA)
Gideon Mann (Google, USA)
David McClotsky (Brown University, USA)
Ray Mooney (UT Austin, USA)
Hwee Tou Ng (National University of Singapore, Singapore)
Vincent Ng (UT Dallas, USA)
Miles Osborne (University of Edinburgh, UK)
Mari Ostendorf (University of Washington, USA)
Chris Pinchak (University of Alberta, Canada)
Dragomir Radev (University of Michigan, USA)
Dan Roth (UIUC, USA)
Anoop Sarkar (Simon Fraser University, Canada)
Dale Schuurmans (University of Alberta, Canada)
Akira Shimazu (JAIST, Japan)
Jun Suzuki (NTT, Japan)
Yee Whye Teh (University College London, UK)
Kristina Toutanova (Microsoft Research, USA)
Jason Weston (NEC, USA)
Tong Zhang (Rutgers University, USA)
Ming Zhou (Microsoft Research Asia, China)
Xiaojin (Jerry) Zhu (University of Wisconsin, USA)

ORGANIZERS AND CONTACT:
- Qin Wang (Yahoo!)
- Kevin Duh (University of Washington)
- Dekang Lin (Google Research)
Email: ssl.nlp2009 at gmail.com
Website: http://sites.google.com/site/sslnlp/

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora