[Corpora-List] Post-doc at LIMSI-CNRS, Orsay, France

Xavier Tannier xtannier at limsi.fr
Wed Oct 3 07:20:40 UTC 2012


  Post-doctoral position: Event-based multi-document summarization for
  building timelines

http://perso.limsi.fr/Individu/xtannier/fr/Stages/post_doc_2012_chronolines.html


      Keywords

/information extraction, natural language processing, temporal analysis, 
events, timelines/


      Location

LIMSI-CNRS, Orsay (Paris), France.


      Duration

1 year


    Context

Among other objectives, national funded project Chronolines 
<http://www.chronolines.fr> aims at creating semi-automatic timelines 
from a query, based on a collection of newswire papers. Given a 
user-defined topic and a set of texts, the task consists in *extracting 
the most important events* concerning the topic and to present them to 
the user for validation. The ideal output would then be a set of brief 
descriptions of events, together with the dates of these events.

Work on this project already resulted in a few publications, among which 
a paper at ACL 2012 on /salient dates extraction/, that the candidate 
can refer to for more details [1] 
<http://aclweb.org/anthology-new/P/P12/P12-1077.pdf>. The candidate 
would be integrated into this project, working in the project team on 
some of the following issues:

  * *Aggregation/Summarization*: how to choose/generate a brief
    description of each event, from a set of relevant sentences.
  * *Evaluation*: what metrics, what methodology for objective evaluation.
  * *Granularity*: as the time unit for our salient date algorithm is
    the day, how to decide that several topic-related important events
    occurred on the same day or, inversely, that an important event
    lasted more than one day.
  * *Relationship*: how to use the big collection of articles to extract
    some relationship between events?


    Required skills

The candidate should hold a PhD in Natural Language Processing and/or 
Information Retrieval, and be able to:

  * Work with texts (interest in linguistic issues and how to deal with
    them)
  * Work with /a lot/ of texts (good programming skills, big corpora
    management, information aggregation, ability to forget about
    linguistic issues when we need to)
  * Learn from (imperfect) references (ability to observe and
    generalize, machine learning skills)
  * Work with tools used and built by the team (in Linux, Java, perl...)


      Contacts:

Xavier.Tannier[at]limsi.fr
Veronique.Moriceau[at]limsi.fr


      Reference:

[1] Rémy Kessler, Xavier Tannier, Caroline Hagège, Véronique Moriceau, 
André Bittar. *Finding Salient Dates for Building Thematic Timelines. 
<http://aclweb.org/anthology-new/P/P12/P12-1077.pdf>* In /Proceedings of 
the 50th Annual Meeting of the Association for Computational Linguistics 
(ACL 2012)/. Jeju Island, Republic of Korea, July 2012. © Association 
for Computational Linguistics.



-- 
Xavier Tannier
Maître de conférence
LIMSI-CNRS (bât. 508, bureau 12, RdC)
Université Paris-Sud 11
B.P. 133
91403 ORSAY CEDEX
FRANCE

http://www.limsi.fr/~xtannier/ <http://www.limsi.fr/%7Extannier/>
tel: 0033 (0)1 69 85 80 12
fax: 0033 (0)1 69 85 80 88
-----------------------------------------------------------



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121003/b126462f/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list