[Corpora-List] EACL 2014 Tutorial on Structured Sparsity in Natural Language Processing

Thu Mar 6 23:08:38 UTC 2014

*

CALL FOR PARTICIPATION

EACL 2014 Tutorial on Structured Sparsity in Natural Language Processing:

Models, Algorithms and Applications

André F. T. Martins <http://www.cs.cmu.edu/%7Eafm>, Mário A. T. 
Figueiredo <http://www.lx.it.pt/%7Emtf>, Noah A. Smith 
<http://www.cs.cmu.edu/%7Enasmith>, and Dani Yogatama

This tutorial will cover recent advances in sparse modeling with diverse 
applications in natural language processing (NLP).  A sparse model is 
one that uses a relatively small number of features to map an input to 
an output, such as a label sequence or parse tree.  The advantages of 
sparsity are, among others, compactness and interpretability; in fact, 
sparsity is currently a major theme in statistics, machine learning, and 
signal processing.  The goal of sparsity can be seen in terms of earlier 
goals of feature selection and therefore model selection (Della Pietra 
et al., 1997; Guyon and Elisseeff, 2003; McCallum, 2003).

This tutorial will focus on methods which embed sparse model selection 
into the parameter estimation problem. In such methods, learning is 
carried out by minimizing a regularized empirical risk functional 
composed of two terms: a "loss term," which controls the goodness of fit 
to the data (e.g., log loss or hinge loss), and a "regularizer term," 
which is designed to promote sparsity.

The simplest example is L1-norm regularization (Tibshirani, 1996), which 
penalizes weight components individually, and has been explored in 
various NLP applications (Kazama and Tsujii, 2003; Goodman, 2004).  More 
sophisticated regularizers, those that use mixed norms and groups of 
weights, are able to promote "structured" sparsity: i.e., they promote 
sparsity patterns that are compatible with a priori knowledge about the 
structure of the feature space. This kind of regularizers has been 
proposed in the statistical and signal processing literature (Yuan and 
Lin, 2006; Zhao et al., 2009; Bach et al., 2011) and is a recent topic 
of research in NLP (Eisenstein et al., 2011; Martins et al, 2011, Das 
and Smith, 2012, Nelakanti et al. 2013). Some regularizers are even able 
to encourage structured sparsity, without prior knowledge about this 
structure (Bondell et al., 2007; Zeng et al., 2013).  Sparsity-inducing 
regularizers require the use of specialized optimization routines for 
learning (Bach et al., 2011, Wright et al., 2009; Xiao, 2009; Langford 
et al., 2009).

The tutorial will consist of three parts: (1) how to formulate the 
problem, i.e., how to choose the right regularizer for the kind of 
sparsity pattern intended; (2) how to solve the optimization problem 
efficiently; and (3) examples of the use of sparsity within natural 
language processing problems.

Tutorial outline:

1.  Introduction (30 minutes):

- What is sparsity?

- Why sparsity is often desirable in NLP

- Feature selection: wrappers, filters, and embedded methods

- What has been done in other areas: the Lasso and group-Lasso, 
compressive sensing, and recovery guarantees

- Theoretical and practical limitations of previous methods to typical 
NLP problems

- Beyond cardinality: structured sparsity

2.  Group-Lasso and Mixed-Norm Regularizers (45 minutes):

- Selecting columns in a grid-shaped feature space

- Examples: multiple classes, multi-task learning, multiple kernel learning

- Mixed L2/L1 and L?/L1 norms: the group Lasso

- Non-overlapping groups

- Example: feature template selection

- Tree-structured groups

- The general case: a DAG

- Coarse-to-fine regularization

- Unknown structure: feature grouping

- Open problems

Coffee Break (15 minutes)

3.  Optimization Algorithms (45 minutes):

- Non-smooth optimization: limitations of subgradient algorithms

- Quasi-Newton methods: OWL-QN

- Proximal gradient algorithms: iterative soft-thresholding, 
forward-backward and other splittings

- Computing proximal steps

- Other algorithms: FISTA, Sparsa, ADMM, Bregman iterations

- Convergence rates

- Online algorithms: limitations of stochastic subgradient descent

- Online proximal gradient algorithms

- Managing general overlapping groups

- Memory footprint, time/space complexity, etc.

- The "Sparseptron" algorithm and debiasing

- Open problems (e.g., non-convex objectives)

4.  Applications (30 minutes):

- Sociolinguistic association discovery

- Sequence problems: named entity recognition, chunking

- Multilingual dependency parsing

- Lexicon expansion

5.  Closing Remarks and Discussion (15 minutes)

*

Looking forward to seeing you in the tutorial!

André, Mário, Noah, Dani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140306/e34b5544/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora