Seminaire: Alpage, Enrique Henestroza Anguiano, 27 mai 2011

Fri May 20 20:31:11 UTC 2011

Date: Fri, 20 May 2011 18:02:37 +0200
From: Benoit Crabbé <benoit.crabbe at gmail.com>
Message-Id: <F41C58E8-4F09-4633-AAD6-B60DE090295E at gmail.com>
X-url: http://alpage.inria.fr/~henestro<#/mml

         *************** Séminaire Alpage *******************

Séminaire de l'école doctorale de Paris 7

Il s'agit du séminaire de recherche en linguistique informatique
organisé par l'équipe Alpage, Alpage est une équipe mixte Inria --
Paris 7 qui centre ses intérêts scientifiques autour de l'analyse
syntaxique automatique et du traitement du discours pour la langue
française.

Le prochain séminaire se tiendra vendredi 27 mai de 11.00 à 13.00 en
salle 3E91 à l'UFRL, 175, rue du Chevaleret, 75013 Paris (3e étage)

Toute personne intéressée est la bienvenue.

***********************************************************

Enrique Henestroza Anguiano (Alpage/Inria)

nous parlera de :

Parse Correction with Specialized Models for Difficult Attachment Types

Résumé:

If statistical syntactic parsing has the advantage of offering a
technique for disambiguating and recovering syntactic structure from a
sentence, it also has the disadvantage of being subject to coverage
problems for different linguistic phenomena in the treebank used for
training. Our goal is to improve parsing performance for syntactic
structures that are difficult to recover accurately, in particular for
coordination and prepositional phrase (pp-) attachment. In this
presentation I will focus on parse correction, which tries to make the
most of the training data at our disposal by performing a second pass
after parsing that reconsiders individual attachments using richer
contextual information. I will also discuss initial work on a method
meant to address lexical coverage problems in the treebank used for
training: the injection of lexical association scores, calculated
automatically over a large text corpus, into a parse correction model
for pp-attachment.

In our approach to syntactic dependency parse correction, attachments
in an input parse tree are revised by choosing, for a given dependent,
the best governor from within a small set of candidates. Assuming that
a dependency parser's predicted parse tree for a sentence is mostly
accurate, parse correction can revise attachments by using the parse
tree's syntactic structure to restrict the set of candidate governors
and extract a rich set of features over the syntactic context to help
choose among the candidates. We consider a general corrective model
that can be applied to all dependents in the output trees of a
dependency parser, and we additionally explore specialized corrective
models specific to coordination and pp-attachment. These two phenomena
are often investigated as isolated problems, but here we treat them in
the more realistic context of syntactic parsing. Our specialized
corrective models are separately trained, and include expanded feature
sets specific to the type of attachment to be corrected. For
pp-attachment, in particular, the expanded feature set includes
lexical association scores between the pp and each candidate
governor. These lexical association scores are acquired automatically
through distributional methods over a large corpus, using classic
collocation extraction measures like mutual information, ttest, and
likelihood ratio.

In initial experiments, we obtain improvements in unlabeled attachment
score over two state-of-the-art statistical syntactic dependency
parsers for French (MaltParser and MSTParser). In addition to
presenting the results of these experiments, I will discuss
possibilities for improving both the dependency parse correction
algorithm and the methods used for acquiring and injecting lexical
association scores.

Page web : http://alpage.inria.fr/~henestro

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------