Journee: Journee thematique, Optimisation et traitement automatique du langage, Universite Paris 13, 16 decembre 2013

Tue Nov 19 20:30:55 UTC 2013

Date: Mon, 18 Nov 2013 12:45:55 +0100
From: Joseph Le Roux <joseph.le.roux at gmail.com>
Message-ID: <m27gc60w7g.fsf at dyn201.lipn.univ-paris13.fr>

Bonjour,

Les méthodes d'optimisation sont de plus en plus utilisées en traitement
automatique des langues pour concevoir des méthodes efficaces tant pour
l'apprentissage de modèles que pour le décodage, notamment en offrant un
cadre formel qui distingue clairement les contraintes globales des
contraintes locales, tout en permettant de les combiner.

L'équipe RCLN du Laboratoire d'Informatique de Paris Nord s'intéresse de
près à ces méthodes pour la modélisation conjointe des différents
niveaux de description linguistique. Dans le cadre du pôle math/stic de
Paris 13, avec le soutien du LABEX EFL, et en collaboration avec
l'équipe AOC du LIPN, l'équipe RCLN organise une journée de séminaires
autour des thèmes de l'optimisation et du traitement automatique des
langues.

Cette journée aura lieu sur le campus de l'Université Paris 13 à
Villetaneuse, dans l'amphithéâtre Euler :

09h30-10h00 :: Accueil (Amphi Euler)

10h00-11h00 :: Exposé d'André Martins     (Amphi Euler)
11h15-12h15 :: Exposé de Sebastian Riedel (Amphi Euler)

Pause déjeuner (LIPN)

13h45-14h45 :: Exposé de Xavier Carreras (Amphi Euler)
15h00-16h00 :: Exposé de Joseph Le Roux  (Amphi Euler)

16h00 :: goûter au LIPN !

La participation à cette journée est libre.

Si vous comptez y assister, nous vous demandons, pour des raisons
pratiques (organisation du buffet notamment), de nous l'indiquer à cette
adresse [[http://doodle.com/87mnbcdbzdmytffc]] .

Résumé des quatre exposés prévus :

------------------------------------------------------------------------
André Martins, Priberam, Lisbon
Title: AD3: A New Decoder for Structured Prediction

Abstract:
In this talk, I will present AD3 ("Alternating Directions Dual
Decomposition"), a new consensus-based decoder for problems
representable as factor graphs.  AD3 is an approximate decoder that
ignores global effects caused by the cycles of the graph, solving a
linear relaxation of the original problem. It can handle many scenarios
often encountered in NLP and IR applications, such as models with
constraints in first-order logic; models involving budget or knapsack
constraints; and combinations of structured models which are
individually tractable, but hard to decode jointly.

Like other dual decomposition algorithms, AD3 has a modular
architecture, where local subproblems are solved independently, and
their solutions are gathered to compute a global update. The key
characteristic of AD3 is that each local subproblem has a quadratic
regularizer, leading to faster convergence (both theoretically and in
practice). After providing closed-form solutions for several of these
subproblems, I will proceed to discuss a recent active set method that
works for arbitrary factors, requiring only a local maximization oracle
(the same oracle required in subgradient-based dual decomposition).

In the second part of the talk, I will discuss two recent applications
of AD3 in NLP problems: dependency parsing and compressive
summarization. I will present "Turbo Parser," an open source dependency
parser, which was recently improved with AD3 and the active set method
to permit fast decoding of non-projective third-order
models. Experiments in 14 languages yield state-of-art results, with
parsing speeds ranging between 700 and 4,000 tokens per second. For
compressive summarization, the use of AD3 leads to a system which is
modular in the three qualities that define a good summary (conciseness,
informativeness, and grammaticality), with state-of-the-art ROUGE
scores, and runtimes close to extractive summarizers.

This work was done in collaboration with Noah Smith, Mário Figueiredo,
Eric Xing, Pedro Aguiar, and Miguel Almeida.

------------------------------------------------------------------------
Sebastian Riedel, UCL, London.
Title: Predict, Price and Cut: Column and Row Generation for Structured
Prediction.

Abstract:
Many problems in NLP, and structured prediction in general, can be cast
as finding high-scoring structures based on a large set of candidate
parts. For example, In second order tagging, we have to select
high-scoring transitions between tags in a globally consistent fashion.
In second order graph-based dependency parsing we have to choose a
quadratic number of first order and a cubic number of second order edges
such that the graph is both high-scoring and a tree. What makes such
problems challenging is the large number of possible parts to consider.
This number not only affects the cost of search or optimization but also
slows down the process of scoring parts before they enter the
optimisation problem, and extracting features.

In this talk I present an approach that can solve problems with large
sets of candidate parts without considering all of these parts in either
optimization or scoring. In contrast to most pruning heuristics, our
algorithm can give certificates of optimality before having optimized
over, or even scored, all parts. It does so without the need of
auxiliary models or tuning of threshold parameters. This is achieved by
a delayed column and row generation algorithm that iteratively solves an
LP relaxation over a small subset of current candidate parts, and then
finds new candidates with high scores that can be inserted into the
current optimal solution without removing high scoring existing
structure. The latter step subtracts from the cost of a part the price
of resources the part requires, and is often referred as pricing.
Sometimes parts may score highly after pricing, but are necessary in
order to make the current solution feasible. We add such parts in a step
that roughly amounts to violated cuts to the LP.

We evaluate our approach on two applications: second order dependency
parsing and first order tagging with large domains. In both cases we
dramatically reduce the number of parts considered, and observe about an
order of magnitude speed-up. This is possible without loss of optimality
guarantees, and hence accuracy.

------------------------------------------------------------------------
Xavier Carreras, UPC, Barcelona
Title: Learning Automata and Grammars: From Spectral Algorithms to
Convex Optimizations

There is an increasing interest in spectral methods to learn
latent-variable language models in the form of weighted automata and
context-free grammars. Spectral methods provide an algebraic formulation
to the problem of inducing automata or grammars from data, and directly
exploit the recurrence relations behind the model. I will review the
spectral method from an algebraic perspective, making use of Hankel
matrices as the key object behind the method: a Hankel matrix collects
all necessary statistics of the distribution we want to learn; and
finding a low-rank factorization of this matrix results in the automata
or grammar. Under mild assumptions, it can be shown that this method
nicely approximates the target model.

From here, I will show how we can reformulate the spectral learning
algorithm as a low-rank convex optimization. This will be useful to
adapt the method to other settings, by adding linear constraints. I will
focus in "unsupervised" induction of context-free grammars, that is,
learning a grammar from plain strings. Our formulation involves
optimizing for a low-rank Hankel matrix that is linearly constrained to
satisfy inside-outside recursions. An analogous method method can be
formulated to learn finite-state transducers from unaligned parallel
strings.

------------------------------------------------------------------------
Joseph Le Roux, LIPN, Paris
Title: Combining PCFG-LA Models with Dual Decomposition: A case Study
with Function Labels and Binarization

Abstract:

It has recently been shown that different NLP models can be effectively
combined using dual decomposition. In this talk, we present how PCFG-LAs
(Probabilistic Context-Free Grammars with Latent Annotations, the
state-of-the-art model for unlexicalized constituent parsing) are
suitable for combination in this way.

We first show how the intractable problem of exact PCFG-LA decoding is
approximated with anchored PCFGs. Then we present a method for combining
anchored PCFGs based on the partial superposition of tree structures.

We experiment with the different models which result from alternative
methods of extracting a grammar from a treebank (retaining or discarding
function labels, left binarization versus right binarization) and
achieve state-of-the-art parsing performance, with a labeled Parseval
F-score of 92.4 on Wall Street Journal Section 23 – this represents an
error reduction rate of 7% over a strong PCFG-LA product-model baseline.

This work was done in collaboration with Antoine Rozenknop and Jennifer
Foster.

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/

ATALA décline toute responsabilité concernant le contenu des
messages diffusés sur la liste LN
-------------------------------------------------------------------------