Habilitation: Agata Savary, Representation and Processing of Composition

Thierry Hamon hamon at LIMSI.FR
Fri Mar 14 13:38:35 UTC 2014

Date: Thu, 13 Mar 2014 14:29:31 +0100
From: Agata Savary <agata.savary at univ-tours.fr>
Message-ID: <5321B2BB.3000700 at univ-tours.fr>
X-url: http://www.info.univ-tours.fr/~savary/


J'ai le plaisir de vous inviter à ma soutenance d'Habilitation à Diriger
des Recherches intitulée "Representation and Processing of Composition, 
Variation and Approximation in Language Resources and Tools".
La soutenance aura lieu le *jeudi 27 mars 2014 à 15 heures* à l'*amphi
3* de l'antenne universitaire de Blois (3 place Jean-Jaurès, 41000

Le jury est composé de :

Anne ABEILLÉ, Professeur des universités, Université Paris 7, France
Jean-Yves ANTOINE, Professeur des universités, Université François
   Rabelais Tours, France
Béatrice DAILLE, Professeur des universités, Université de Nantes,
Jan HAJIC(, Professeur, Charles University in Prague, République Tchèque
Denis MAUREL, Professeur des universités, Université François Rabelais
   Tours, France
Agnieszka MYKOWIECKA, Chargée de recherche, HDR, Académie polonaise de
   sciences, Varsovie, Pologne
Joachim NIEHREN, Directeur de recherche, Institut national de recherche
   en informatique et en automatique, Lille, France

Un plan d'accès à l'antenne universitaire est accessible à l'adresse:

La soutenance sera suivie d'un pot en salle 401 auquel vous êtes
cordialement conviés.

Au plaisir de vous accueillir à cet événement,

Agata Savary


In my habilitation dissertation, meant to validate my capacity of and
maturity for directing research activities, I present a panorama of
several topics in computational linguistics, linguistics and computer
Over the past decade, I was notably concerned with the phenomena of
compositionality and variability of linguistic objects. I illustrate the
advantages of a compositional approach to the language in the domain of
emotion detection and I explain how some linguistic objects, most
prominently multi-word expressions, defy the compositionality
principles. I demonstrate that the complex properties of MWEs, notably
variability, are partially regular and partially idiosyncratic. This
fact places the MWEs on the frontiers between different levels of
linguistic processing, such as lexicon and syntax.
I show the highly heterogeneous nature of MWEs by citing their two
existing taxonomies.  After an extensive state-of-the art study of MWE
description and processing, I summarize Multiflex, a formalism and a
tool for lexical high-quality morphosyntactic description of MWUs.  It
uses a graph-based approach in which the inflection of a MWU is
expressed in function of the morphology of its components, and of
morphosyntactic transformation patterns. Due to unification the
inflection paradigms are represented compactly. Orthographic,
inflectional and syntactic variants are treated within the same
framework. The proposal is multilingual: it has been tested on six
European languages of three different origins (Germanic, Romance and
Slavic), I believe that many others can also be successfully
covered. Multiflex proves interoperable. It adapts to different
morphological language models, token boundary definitions, and
underlying modules for the morphology of single words. It has been
applied to the creation and enrichment of linguistic resources, as well
as to morphosyntactic analysis and generation. It can be integrated into
other NLP applications requiring the conflation of different surface
realizations of the same concept.
Another chapter of my activity concerns named entities, most of which
are particular types of MWEs. Their rich semantic load turned them into
a hot topic in the NLP community, which is documented in my state-of-the
art survey. I present the main assumptions, processes and results issued
from large annotation tasks at two levels (for named entities and for
coreference), parts of the National Corpus of Polish construction. I
have also contributed to the development of both rule-based and
probabilistic named entity recognition tools, and to an automated
enrichment of Prolexbase, a large multilingual database of proper names,
from open sources.  With respect to multi-word expressions, named
entities and coreference mentions, I pay a special attention to nested
structures. This problem sheds new light on the treatment of complex
linguistic units in NLP. When these units start being modeled as trees
(or, more generally, as acyclic graphs) rather than as flat sequences of
tokens, long-distance dependencies, discontinuities, overlapping and
other frequent linguistic properties become easier to represent. This
calls for more complex processing methods which control larger contexts
than what usually happens in sequential processing. Thus, both named
entity recognition and coreference resolution comes very close to
parsing, and named entities or mentions with their nested structures are
analogous to multi-word expressions with embedded complements.

My parallel activity concerns finite-state methods for natural language
and XML processing.  My main contribution in this field, co-authored
with 2 colleagues, is the first full-fledged method for tree-to-language
correction, and more precisely for correcting XML documents with respect
to a DTD. We have also produced interesting results in incremental
finite-state algorithmics, particularly relevant to data evolution
contexts such as dynamic vocabularies or user updates.  Multilinguality
is the leitmotif of my research. I have applied my methods to several
natural languages, most importantly to Polish, Serbian, English and
French. I have been among the initiators of a highly multilingual
European scientific network dedicated to parsing and multi- word
expressions. I have used multilingual linguistic data in experimental
studies. I believe that it is particularly worthwhile to design NLP
solutions taking declension-rich (e.g. Slavic) languages into account,
since this leads to more universal solutions, at least as far as nominal
constructions (MWUs, NEs, mentions) are concerned. For instance, when
Multiflex had been developed with Polish in mind it could be applied as
such to French, English, Serbian and Greek.  Also, a French-Serbian
collaboration led to substantial modifications in morphological modeling
in Prolexbase in its early development stages. This allowed for its
later application to Polish with very few adaptations of the existing
model. Other researchers also stress the advantages of NLP studies on
highly inflected languages since their morphology encodes much more
syntactic information than is the case e.g. in English.
In this dissertation I am also supposed to demonstrate my ability of
playing an active role in shaping the scientific landscape, on a local,
national and international scale. I describe my: (i) various scientific
collaborations and supervision activities, (ii) roles in over 10
regional, national and international projects, (iii) responsibilities in
collective bodies such as program and organizing committees of
conferences and workshops, PhD juries, and the National University
Council (CNU), (iv) activity as an evaluator and a reviewer of European
collaborative projects.  The issues addressed in this dissertation open
interesting scientific perspectives, in which a special impact is put on
links among various domains and communities. These perspectives include:
(i) integrating fine-grained language data into the linked open data,
(ii) deep parsing of multi-word expressions, (iii) modeling multi-word
expression identification in a treebank as a tree-to-language correction
problem, and (iv) a taxonomy and an experimental benchmark for
tree-to-language correction approaches.

Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/

ATALA décline toute responsabilité concernant le contenu des
messages diffusés sur la liste LN

More information about the Ln mailing list