[Corpora-List] Open PhD position: "Dynamic multi-document automatic summarization"
Gaël de Chalendar
gael.de-chalendar at cea.fr
Wed Jun 11 13:43:21 UTC 2014
Urgent: Application deadline 06/25/2014
Choral [1] is an automatic extraction-based single-document automatic
summarizer developed at CEA LIST / LVIC (http://www-list.cea.fr/). It is
industrialized and made available to thousands of users. Choral relies heavily
on the laboratory's multilingual linguistic analyzer LIMA [2]. Currently,
Choral merely extracts verbatim sentences from a single source document that
it considers most relevant based on several criteria (the most represented
meaning in the document words, phrases expressing the views of the author, the
presence of complex noun phrases, ...).
Underlying trends in recent years in the field are multi-document summary
[3,4] and dynamic abstract (or progressive) [5]. A further approach, profile
oriented summary, has already been explored in the laboratory [6]. The purpose
of this thesis is to propose improvements on existing technologies and
integrate an implementation into Choral for experimentation.
The PhD will roughly follow the following planning:
- Exploration of the literature;
- Understanding of existing tools and code;
- Proposal of possible improvements to existing approaches using tools and
resources specific to the laboratory;
- Design and development of an implementation in Choral;
- Evaluation of results on benchmark data;
- If possible, participation to an international evaluation campaign.
The PhD held in the LVIC premises at Nano Innov located in Palaiseau (near
Ecole Polytechnique, Sup'Optique, Thales and Danone) in Paris area, France. It
is directly funded by CEA.
Requirements:
- Deadline for applications receipt: June, 25 2014
- Master level in Computer Science with a NLP component
- Bachelor, Master 1 and 2 or equivalent with honours (attach transcripts)
Send applications to:
mailto: gael.de-Chalendar at cea.fr
References:
[1] Flores, J. G., Ferret, O., & de Chalendar, G. Summarizing through sense
concentration and Contextual Exploration rules: the CHORAL system at TAC 2009.
[2] Besançon, R. et al. (2010). “LIMA: A Multilingual Framework for Linguistic
Analysis and Linguistic Resources Development and Evaluation”. In LREC 2010.
[3] Ji, H., Favre, B., Lin, W. P., Gillick, D., Hakkani-Tur, D., & Grishman,
R. (2013). Open-Domain multi-document summarization via information
extraction: Challenges and prospects. In Multi-source, Multilingual
Information Extraction and Summarization (pp. 177-201). Springer Berlin
Heidelberg.
[4] Munoz, R., & Atkinson, J. (2013). Rhetorics-based multi-document
summarization. Expert Systems with Applications.
[5] Gohr, A., Spiliopoulou, M., & Hinneburg, A. (2013). Visually Summarizing
Semantic Evolution in Document Streams with Topic Table. In Knowledge
Discovery, Knowledge Engineering and Knowledge Management (pp. 136-150).
Springer Berlin Heidelberg.
[6] Ferret, O., Châar, S. L., & Fluhr, C. (2004). Filtrage pour la
construction de résumés multi-documents guidée par un profil. Traitement
automatique des langues, 45(1), 65-93.
--
Gael de Chalendar
CEA LIST
Laboratoire Vision et Ingénierie des Contenus
(Vision and Content Engineering Laboratory)
CEA SACLAY - NANO INNOV
BAT. 861
Point courier 173
91191 GIF SUR YVETTE
Tél.:+33.1.69.08.01.50 Fax:+33.1.69.08.01.15
Email : Gael.D.O.T.de-Chalendar.A at T.cea.D.O.T.fr
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list