[Corpora-List] EACL 2014 Tutorial on Structured Sparsity in Natural Language Processing
Andre Martins
afm at cs.cmu.edu
Thu Mar 6 23:08:38 UTC 2014
*
CALL FOR PARTICIPATION
EACL 2014 Tutorial on Structured Sparsity in Natural Language Processing:
Models, Algorithms and Applications
André F. T. Martins <http://www.cs.cmu.edu/%7Eafm>, Mário A. T.
Figueiredo <http://www.lx.it.pt/%7Emtf>, Noah A. Smith
<http://www.cs.cmu.edu/%7Enasmith>, and Dani Yogatama
This tutorial will cover recent advances in sparse modeling with diverse
applications in natural language processing (NLP). A sparse model is
one that uses a relatively small number of features to map an input to
an output, such as a label sequence or parse tree. The advantages of
sparsity are, among others, compactness and interpretability; in fact,
sparsity is currently a major theme in statistics, machine learning, and
signal processing. The goal of sparsity can be seen in terms of earlier
goals of feature selection and therefore model selection (Della Pietra
et al., 1997; Guyon and Elisseeff, 2003; McCallum, 2003).
This tutorial will focus on methods which embed sparse model selection
into the parameter estimation problem. In such methods, learning is
carried out by minimizing a regularized empirical risk functional
composed of two terms: a "loss term," which controls the goodness of fit
to the data (e.g., log loss or hinge loss), and a "regularizer term,"
which is designed to promote sparsity.
The simplest example is L1-norm regularization (Tibshirani, 1996), which
penalizes weight components individually, and has been explored in
various NLP applications (Kazama and Tsujii, 2003; Goodman, 2004). More
sophisticated regularizers, those that use mixed norms and groups of
weights, are able to promote "structured" sparsity: i.e., they promote
sparsity patterns that are compatible with a priori knowledge about the
structure of the feature space. This kind of regularizers has been
proposed in the statistical and signal processing literature (Yuan and
Lin, 2006; Zhao et al., 2009; Bach et al., 2011) and is a recent topic
of research in NLP (Eisenstein et al., 2011; Martins et al, 2011, Das
and Smith, 2012, Nelakanti et al. 2013). Some regularizers are even able
to encourage structured sparsity, without prior knowledge about this
structure (Bondell et al., 2007; Zeng et al., 2013). Sparsity-inducing
regularizers require the use of specialized optimization routines for
learning (Bach et al., 2011, Wright et al., 2009; Xiao, 2009; Langford
et al., 2009).
The tutorial will consist of three parts: (1) how to formulate the
problem, i.e., how to choose the right regularizer for the kind of
sparsity pattern intended; (2) how to solve the optimization problem
efficiently; and (3) examples of the use of sparsity within natural
language processing problems.
Tutorial outline:
1. Introduction (30 minutes):
- What is sparsity?
- Why sparsity is often desirable in NLP
- Feature selection: wrappers, filters, and embedded methods
- What has been done in other areas: the Lasso and group-Lasso,
compressive sensing, and recovery guarantees
- Theoretical and practical limitations of previous methods to typical
NLP problems
- Beyond cardinality: structured sparsity
2. Group-Lasso and Mixed-Norm Regularizers (45 minutes):
- Selecting columns in a grid-shaped feature space
- Examples: multiple classes, multi-task learning, multiple kernel learning
- Mixed L2/L1 and L?/L1 norms: the group Lasso
- Non-overlapping groups
- Example: feature template selection
- Tree-structured groups
- The general case: a DAG
- Coarse-to-fine regularization
- Unknown structure: feature grouping
- Open problems
Coffee Break (15 minutes)
3. Optimization Algorithms (45 minutes):
- Non-smooth optimization: limitations of subgradient algorithms
- Quasi-Newton methods: OWL-QN
- Proximal gradient algorithms: iterative soft-thresholding,
forward-backward and other splittings
- Computing proximal steps
- Other algorithms: FISTA, Sparsa, ADMM, Bregman iterations
- Convergence rates
- Online algorithms: limitations of stochastic subgradient descent
- Online proximal gradient algorithms
- Managing general overlapping groups
- Memory footprint, time/space complexity, etc.
- The "Sparseptron" algorithm and debiasing
- Open problems (e.g., non-convex objectives)
4. Applications (30 minutes):
- Sociolinguistic association discovery
- Sequence problems: named entity recognition, chunking
- Multilingual dependency parsing
- Lexicon expansion
5. Closing Remarks and Discussion (15 minutes)
*
Looking forward to seeing you in the tutorial!
André, Mário, Noah, Dani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140306/e34b5544/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list