Job: Post-doc at LIMSI-CNRS, Orsay, France

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Fri Oct 5 19:01:22 UTC 2012

Date: Wed, 03 Oct 2012 09:20:40 +0200
From: Xavier Tannier <xtannier at>
Message-ID: <506BE748.9060207 at>

Post-doctoral position: Event-based multi-document summarization for
building timelines


/information extraction, natural language processing, temporal analysis,
events, timelines/


LIMSI-CNRS, Orsay (Paris), France.


1 year


Among other objectives, national funded project Chronolines aims at creating semi-automatic timelines from
a query, based on a collection of newswire papers. Given a user-defined
topic and a set of texts, the task consists in *extracting the most
important events* concerning the topic and to present them to the user
for validation. The ideal output would then be a set of brief
descriptions of events, together with the dates of these events.

Work on this project already resulted in a few publications, among which
a paper at ACL 2012 on /salient dates extraction/, that the candidate
can refer to for more details [1] The candidate would
be integrated into this project, working in the project team on some of
the following issues:

  * *Aggregation/Summarization*: how to choose/generate a brief
    description of each event, from a set of relevant sentences.

  * *Evaluation*: what metrics, what methodology for objective

  * *Granularity*: as the time unit for our salient date algorithm is
    the day, how to decide that several topic-related important events
    occurred on the same day or, inversely, that an important event
    lasted more than one day.

  * *Relationship*: how to use the big collection of articles to extract
    some relationship between events?

    Required skills

The candidate should hold a PhD in Natural Language Processing and/or
Information Retrieval, and be able to:

  * Work with texts (interest in linguistic issues and how to deal with

  * Work with /a lot/ of texts (good programming skills, big corpora
    management, information aggregation, ability to forget about
    linguistic issues when we need to)

  * Learn from (imperfect) references (ability to observe and
    generalize, machine learning skills)

  * Work with tools used and built by the team (in Linux, Java, perl...)




[1] Rémy Kessler, Xavier Tannier, Caroline Hagège, Véronique Moriceau,
André Bittar. *Finding Salient Dates for Building Thematic Timelines.* In /Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics
(ACL 2012)/. Jeju Island, Republic of Korea, July 2012. © Association
for Computational Linguistics.

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list