2nd Call for Papers: Workshop on Robust Unsupervised and Semisupervised Methods (ROBUS2011)

Anders Søgaard soegaard at HUM.KU.DK
Fri May 20 07:49:40 UTC 2011


*******************************************************************
                                      ROBUS2011
         Workshop on Robust Unsupervised and Semisupervised Methods
                      in Natural Language Processing
                  https://sites.google.com/site/robus2011/
                     (in conjunction with RANLP 2011)
                  Hissar, Bulgaria, 15/16. September 2011

*******************************************************************

<Apologies if you receive multiple copies>

CALL FOR PAPERS

In natural language processing (NLP), supervised learning scenarios are more frequently explored than unsupervised or semi-supervised ones. Unfortunately, labeled data are often highly domain-dependent and short in supply. It has therefore become increasingly important to leverage both labeled and unlabeled data to achieve the best performance in challenging NLP problems that involve learning of structured variables.

Until recently most results in semi-supervised learning of structured variables in NLP were negative (Abney, 2008), but today the best part-of-speech taggers (Suzuki et al., 2008), named entity recognizers (Turian et al., 2010), and dependency parsers (Sagae and Tsujii, 2007; Suzuki et al., 2009; Søgaard and Rishøj, 2010) exploit mixtures of labeled and unlabeled data. Unsupervised and minimally unsupervised NLP also sees rapid growth.

The most commonly used semi-supervised learning algorithms in NLP are feature-based methods (Koo et al., 2008; Sagae and Gordon, 2009; Turian et al., 2010) and EM, self- or co-training (Mihalcea, 2004; Sagae and Tsujii, 2007; Spoustova et al., 2009). Mixture models have also been successfully used (Suzuki and Isozaki, 2008; Suzuki et al., 2009). While feature-based methods seem relatively robust, self-training and co-training are very parameter-sensitive, and parameter tuning has therefore become an important research topic (Goldberg and Zhu, 2009). This is not only a concern in NLP, but also in other areas such as face recognition, e.g. Yan and Wang (2009). Parameter-sensitivity is even more dramatic in unsupervised learning of structured variables, e.g. unsupervised part-of-speech tagging and grammar induction.

By more robust unsupervised or semi-supervised learning algorithms we mean algorithms with few parameters that give good results across different data sets and different applications.

Specifically, we encourage submissions on the following topics:
    * assessing robustness of known or new unsupervised or semi-supervised methods across different NLP problems or languages
    * new unsupervised or semi-supervised methods for NLP problems
    * positive and negative results on using of unsupervised or semi-supervised methods in applications
    * application-oriented evaluation of unsupervised or semi-supervised methods
    * comparison and combination of unsupervised or semi-supervised methods

This workshop aims to bring together researchers dedicated to designing and evaluating robust unsupervised or semi-supervised learning algorithms for NLP problems. This includes, but is not limited to POS tagging, grammar induction and parsing, named entity recognition, word sense induction and disambiguation, machine translation, sentiment analysis and taxonomy learning. Our goal is to evaluate known unsupervised and semi-supervised learning algorithms, foster novel and more robust ones and discuss positive and negative results that may otherwise not appear in a technical paper at a major conference. We welcome submissions that address the robustness of unsupervised or semi-supervised learning algorithms for NLP, and especially encourage authors to provide results for different data sets, languages or applications.

IMPORTANT DATES
Submission deadline: July 15 2011.
Notification: August 15 2011.
Workshop: September 15-16 2011.

SUBMISSION GUIDELINES
Use the RANLP style sheets found here: http://lml.bas.bg/ranlp2011/submissions.php
We invite long (8) and short (4) papers. All papers will appear in the ACL bibliography. (Accepted short papers will be presented either as short oral presentations or as posters.) Submission page: www.softconf.com/ranlp11/robus2011/

PROGRAM COMMITTEE
* Steven Abney, University of Michigan
* Stefan Bordag, ExB Research & Development
* Eugenie Giesbrecht, FZI Karlsruhe
* Katja Filippova, Google
* Florian Holz, University of Leipzig
* Jonas Kuhn, University of Stuttgart
* Vivi Nastase, HITS Heidelberg
* Reinhard Rapp, JG University of Mainz
* Lucia Specia, University of Wolverhampton
* Valentin Spitkovsky, Stanford University
* Sven Teresniak, University of Leipzig
* Dekai Wu, HKUST
* Torsten Zesch, TU Darmstadt
* Jerry Zhu, University of Wisconsin-Madison

ORGANIZERS
Chris Biemann, TU Darmstadt
Anders Søgaard, University of Copenhagen

CONTACT: soegaard(at)hum.ku.dk


References:
Steven Abney. 2008. Semi-supervised learning for computational linguistics. Chapman & Hall.
Andrew Goldberg and Jerry Zhu. 2009.  Keepin' it real: semi-supervised learning with realistic tuning. In NAACL.
Terry Koo et al. 2008. Simple semi-supervised dependency parsing. In ACL-HLT.
Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In CoNLL.
Kenji Sagae and Jun'ichi Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In CoNLL Shared Task.
Kenji Sagae and Andrew Gordon. 2009. Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures. In IWPT.
Drahomira Spoustova et al., 2009. Semi-supervised training for the averaged perceptron POS tagger. In EACL.
Jun Suzuki and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In ACL-HLT.
Jun Suzuki et al. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In EMNLP.
Anders Søgaard and Christian Rishøj. 2010. Semi-supervised dependency parsing using generalized tri-training.
Joseph Turian et al. 2010. Word representations: a simple and general method for semi-supervised learning. In ACL.
Shuicheng Yan and Huan Wang. 2009. Semi-supervised learning by sparse representation. In SIAM Data Mining.



****************************
Anders Søgaard, Ass.Prof.,
Center for Language Technology
University of Copenhagen
Njalsgade 140
DK-2300 Copenhagen S
****************************


More information about the LFG mailing list