Seminaire: Alpage, 12 novembre 2010 (Federico Sangati)

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Nov 10 19:19:48 UTC 2010

Date: Sun, 7 Nov 2010 18:41:27 +0100
From: "Benoit Crabbé" <benoit.crabbe at>
Message-Id: <9717087E-4B23-4FA9-A886-16F0D356A06F at>

*************** Séminaire Alpage *******************

  Séminaire de l'école doctorale de Paris 7

Il s'agit du séminaire de recherche en linguistique informatique
organisé par l'équipe Alpage, Alpage est une équipe mixte Inria --
Paris 7 qui centre ses intérêts scientifiques autour de l'analyse
syntaxique automatique et du traitement du discours pour la langue

Suite au déménagement de l'UFRL, 
le séminaire se tiendra en salle 3E91
vendredi 12 novembre de 11.00 à 13.00
175 rue du Chevaleret 75013 Paris. (3e étage)

Toute personne intéressée est la bienvenue.


Federico Sangati 

nous parlera de :

Grammatical Models for Constituency and Dependency Structures


In this talk I would like to present two areas of research I'm working
on. They are both concerned with corpus-based analyses of syntactic
structures, and formulation of statistical models for parsing. 

In the first part I will adopt Phrase Structures (PS) as the
underlying syntactic representation, and Data Oriented Parsing (DOP)
as the grammatical framework. A general assumption in many linguistic
theories is to consider a syntactic construction linguistically
relevant if there is some empirical evidence about its reusability in
a representative corpus of examples. Using this intuition, I will show
how by adopting a kernel based methodology it is possible to
efficiently identify all tree fragments recurring multiple times in
any large treebank. I will illustrate how this can be useful for
guiding linguistic analysis as well as parsing novel sentences.

In the second part I will introduce a novel syntactic dependency
representation (TDS) inspired by the work of Lucien Tesnière. In this
work we have attempted to go back to the roots of dependency theory,
and formulate a way to transform the English WSJ treebank into a novel
DS notation, which we claim to be closer to the original formulation
with respect to other DS conversions. I will show how TDS can
incorporate all main advantages of both PS and modern DS, while
avoiding well known problems concerning the choice of heads, and
better representing common linguistic phenomena such as
coordination. Finally I will present some preliminary results of
parsing TDS.


Rens Bod and Remko Scha. Data-Oriented Language Processing An
Overview. Tutorial Paper available at

Khalil Sima’an. A Short Introduction to the DOP Model.

Federico Sangati, Willem Zuidema, and Rens Bod. Efficiently extract
recurring tree fragments from large treebanks. In proceedings of

Lucien Tesnière. Éléments de syntaxe structurale, Klincksieck, Paris

Federico Sangati and Chiara Mazza. An English Dependency Treebank à la
Tesnière. Proceedings TLT8.

Federico Sangati. A Probabilistic Generative Model for an Intermediate
Constituency-Dependency Representation. In proceedings of the SRW,

Site web du séminaire:

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list