Seminaire: Giorgio Satta, Seminaire Alpage, Lundi 2 juin 2008

Tue May 13 10:39:17 UTC 2008

Date: Fri,  9 May 2008 19:36:58 +0200
From: Benoit Crabbé <bcrabbe at linguist.jussieu.fr>
Message-ID: <1210354618.48248bbada05f at imp.linguist.jussieu.fr>
X-url: http://alpage.inria.fr/seminaire.fr.html
X-url: http://li.linguist.jussieu.fr

              Séminaire de l'école doctorale de Paris 7

Il s'agit du séminaire de recherche en linguistique informatique
organisé par l'équipe Alpage, Alpage est une nouvelle équipe mixte
Inria -- Paris 7 issue de la fusion des équipes Atoll et Talana.

L'équipe centre ses intérêts scientifiques autour de l'analyse
syntaxique automatique et du traitement du discours pour la langue
française.

Ce séminaire remplace l'ancien séminaire Talana. Il se tient le lundi
de 14.30 à 16.30 tous les 15 jours.

Il a lieu en salle 131 au 30 rue du Chateau des Rentiers 75013 Paris
(premier étage)

Toute personne intéressée est la bienvenue.

*************************************************************
Lundi 2 juin

Giorgio Satta (Université de Padoue)

nous parlera de :

Measuring Parsing Difficulty Across Treebanks

Abstract:

One of the main difficulties in statistical parsing is associated with
the task of choosing the correct parse tree for the input sentence.
While this difficulty is usually evaluated by means of empirical
performance measures, such as labeled precision and recall, several
theoretical measures have also been proposed in the literature, mostly
based on the notion of cross-entropy of a treebank.  We show how
cross-entropy can be misleading to this end, and propose an
alternative theoretical measure, called the expected conditional
cross-entropy (ECC).

We conjecture that the ECC provides a measure of the informativeness
of a treebank, in such a way that more informative treebanks are
easier to parse under the chosen model.  We test our conjecture by
comparing ECC values against standard performance measures across
several treebanks for English, French, German and Italian, as well as
other treebanks with different degrees of ambiguity and
informativeness, obtained by means of artificial transformations of a
source treebank.  All of our experiments show the effectiveness of the
ECC in characterizing parsing difficulty across different treebanks,
making it possible treebank comparison.

Work done in collaboration with: Anna Corazza, Alberto Lavelli

-----------------------
Page web du séminaire :
http://alpage.inria.fr/seminaire.fr.html
Cursus Linguistique informatique de Paris 7 :
http://li.linguist.jussieu.fr

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------