Job: Post-doc at LIMSI-CNRS, Orsay, France

Fri Oct 5 19:01:22 UTC 2012

Date: Wed, 03 Oct 2012 09:20:40 +0200
From: Xavier Tannier <xtannier at limsi.fr>
Message-ID: <506BE748.9060207 at limsi.fr>
X-url: http://perso.limsi.fr/Individu/xtannier/fr/Stages/post_doc_2012_chronolines.html
X-url: http://www.chronolines.fr

Post-doctoral position: Event-based multi-document summarization for
building timelines

http://perso.limsi.fr/Individu/xtannier/fr/Stages/post_doc_2012_chronolines.html

Keywords

/information extraction, natural language processing, temporal analysis,
events, timelines/

Location

LIMSI-CNRS, Orsay (Paris), France.

Duration

1 year

Context

Among other objectives, national funded project Chronolines
http://www.chronolines.fr aims at creating semi-automatic timelines from
a query, based on a collection of newswire papers. Given a user-defined
topic and a set of texts, the task consists in *extracting the most
important events* concerning the topic and to present them to the user
for validation. The ideal output would then be a set of brief
descriptions of events, together with the dates of these events.

Work on this project already resulted in a few publications, among which
a paper at ACL 2012 on /salient dates extraction/, that the candidate
can refer to for more details [1]
http://aclweb.org/anthology-new/P/P12/P12-1077.pdf. The candidate would
be integrated into this project, working in the project team on some of
the following issues:

  * *Aggregation/Summarization*: how to choose/generate a brief
    description of each event, from a set of relevant sentences.

  * *Evaluation*: what metrics, what methodology for objective
     evaluation.

  * *Granularity*: as the time unit for our salient date algorithm is
    the day, how to decide that several topic-related important events
    occurred on the same day or, inversely, that an important event
    lasted more than one day.

  * *Relationship*: how to use the big collection of articles to extract
    some relationship between events?

    Required skills

The candidate should hold a PhD in Natural Language Processing and/or
Information Retrieval, and be able to:

  * Work with texts (interest in linguistic issues and how to deal with
    them)

  * Work with /a lot/ of texts (good programming skills, big corpora
    management, information aggregation, ability to forget about
    linguistic issues when we need to)

  * Learn from (imperfect) references (ability to observe and
    generalize, machine learning skills)

  * Work with tools used and built by the team (in Linux, Java, perl...)

      Contacts:

Xavier.Tannier[at]limsi.fr
Veronique.Moriceau[at]limsi.fr

      Reference:

[1] Rémy Kessler, Xavier Tannier, Caroline Hagège, Véronique Moriceau,
André Bittar. *Finding Salient Dates for Building Thematic Timelines.
http://aclweb.org/anthology-new/P/P12/P12-1077.pdf* In /Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics
(ACL 2012)/. Jeju Island, Republic of Korea, July 2012. © Association
for Computational Linguistics.

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------