Seminaire: seminaire Alpage (Giorgio Satta -- 2 juin)

Thu May 29 15:32:19 UTC 2008

Date: Wed, 28 May 2008 16:24:53 +0200
From: Benoit Crabbé <bcrabbe at linguist.jussieu.fr>
Message-ID: <1211984693.483d6b357acf0 at imp.linguist.jussieu.fr>
X-url: http://alpage.inria.fr/seminaire.fr.html
X-url: http://li.linguist.jussieu.fr

************************************************************************
*** ATTENTION LE SEMINAIRE AURA LIEU EXCEPTIONNELLEMENT EN SALLE 134 ***
************************************************************************

        Séminaire de l'école doctorale de Paris 7

Il s'agit du séminaire de recherche en linguistique informatique
organisé par l'équipe Alpage, Alpage est une nouvelle équipe mixte
Inria -- Paris 7 issue de la fusion des équipes Atoll et Talana.

L'équipe centre ses intérêts scientifiques autour de l'analyse
syntaxique automatique et du traitement du discours pour la langue
française.

Ce séminaire remplace l'ancien séminaire Talana. Il se tient le lundi
de 14.30 à 16.30 tous les 15 jours.

Il a lieu en salle 134 au 30 rue du Chateau des Rentiers 75013 Paris
(premier étage)

Toute personne intéressée est la bienvenue.

*************************************************************
Lundi 2 juin

Giorgio Satta (Université de Padoue)

nous parlera de :

Measuring Parsing Difficulty Across Treebanks

(Attention le séminaire aura lieu en salle 134)

Abstract:

One of the main difficulties in statistical parsing is associated with
the task  of choosing the correct  parse tree for  the input sentence.
While  this difficulty  is  usually evaluated  by  means of  empirical
performance measures,  such as  labeled precision and  recall, several
theoretical measures have also been proposed in the literature, mostly
based  on the  notion of  cross-entropy of  a treebank.   We  show how
cross-entropy  can   be  misleading  to  this  end,   and  propose  an
alternative  theoretical  measure,  called  the  expected  conditional
cross-entropy (ECC).

We conjecture that  the ECC provides a measure  of the informativeness
of  a treebank,  in such  a way  that more  informative  treebanks are
easier to  parse under  the chosen model.   We test our  conjecture by
comparing  ECC  values against  standard  performance measures  across
several treebanks for English, French,  German and Italian, as well as
other   treebanks   with    different   degrees   of   ambiguity   and
informativeness, obtained by means  of artificial transformations of a
source treebank.  All of our experiments show the effectiveness of the
ECC in  characterizing parsing difficulty  across different treebanks,
making it possible treebank comparison.

Work done in collaboration with: Anna Corazza, Alberto Lavelli

-----------------------
Page web du séminaire :
http://alpage.inria.fr/seminaire.fr.html
Cursus Linguistique informatique de Paris 7 :
http://li.linguist.jussieu.fr

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------