[Corpora-List] Post-doc at LIMSI-CNRS, Orsay, France
Xavier Tannier
xtannier at limsi.fr
Wed Oct 3 07:20:40 UTC 2012
Post-doctoral position: Event-based multi-document summarization for
building timelines
http://perso.limsi.fr/Individu/xtannier/fr/Stages/post_doc_2012_chronolines.html
Keywords
/information extraction, natural language processing, temporal analysis,
events, timelines/
Location
LIMSI-CNRS, Orsay (Paris), France.
Duration
1 year
Context
Among other objectives, national funded project Chronolines
<http://www.chronolines.fr> aims at creating semi-automatic timelines
from a query, based on a collection of newswire papers. Given a
user-defined topic and a set of texts, the task consists in *extracting
the most important events* concerning the topic and to present them to
the user for validation. The ideal output would then be a set of brief
descriptions of events, together with the dates of these events.
Work on this project already resulted in a few publications, among which
a paper at ACL 2012 on /salient dates extraction/, that the candidate
can refer to for more details [1]
<http://aclweb.org/anthology-new/P/P12/P12-1077.pdf>. The candidate
would be integrated into this project, working in the project team on
some of the following issues:
* *Aggregation/Summarization*: how to choose/generate a brief
description of each event, from a set of relevant sentences.
* *Evaluation*: what metrics, what methodology for objective evaluation.
* *Granularity*: as the time unit for our salient date algorithm is
the day, how to decide that several topic-related important events
occurred on the same day or, inversely, that an important event
lasted more than one day.
* *Relationship*: how to use the big collection of articles to extract
some relationship between events?
Required skills
The candidate should hold a PhD in Natural Language Processing and/or
Information Retrieval, and be able to:
* Work with texts (interest in linguistic issues and how to deal with
them)
* Work with /a lot/ of texts (good programming skills, big corpora
management, information aggregation, ability to forget about
linguistic issues when we need to)
* Learn from (imperfect) references (ability to observe and
generalize, machine learning skills)
* Work with tools used and built by the team (in Linux, Java, perl...)
Contacts:
Xavier.Tannier[at]limsi.fr
Veronique.Moriceau[at]limsi.fr
Reference:
[1] Rémy Kessler, Xavier Tannier, Caroline Hagège, Véronique Moriceau,
André Bittar. *Finding Salient Dates for Building Thematic Timelines.
<http://aclweb.org/anthology-new/P/P12/P12-1077.pdf>* In /Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics
(ACL 2012)/. Jeju Island, Republic of Korea, July 2012. © Association
for Computational Linguistics.
--
Xavier Tannier
Maître de conférence
LIMSI-CNRS (bât. 508, bureau 12, RdC)
Université Paris-Sud 11
B.P. 133
91403 ORSAY CEDEX
FRANCE
http://www.limsi.fr/~xtannier/ <http://www.limsi.fr/%7Extannier/>
tel: 0033 (0)1 69 85 80 12
fax: 0033 (0)1 69 85 80 88
-----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121003/b126462f/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list