[Corpora-List] Open PhD position: "Dynamic multi-document automatic summarization"

Gaël de Chalendar gael.de-chalendar at cea.fr
Wed Jun 11 13:43:21 UTC 2014


Urgent: Application deadline 06/25/2014

Choral [1] is an automatic extraction-based single-document automatic 
summarizer developed at CEA LIST / LVIC (http://www-list.cea.fr/). It is 
industrialized and made available to thousands of users. Choral relies heavily 
on the laboratory's multilingual linguistic analyzer LIMA [2]. Currently, 
Choral merely extracts verbatim sentences from a single source document that 
it considers most relevant based on several criteria (the most represented 
meaning in the document words, phrases expressing the views of the author, the 
presence of complex noun phrases, ...).

Underlying trends in recent years in the field are multi-document summary 
[3,4] and dynamic abstract (or progressive) [5]. A further approach, profile 
oriented summary, has already been explored in the laboratory [6]. The purpose 
of this thesis is to propose improvements on existing technologies and 
integrate an implementation into Choral for experimentation.


The PhD will roughly follow the following planning:
- Exploration of the literature;
- Understanding of existing tools and code;
- Proposal of possible improvements to existing approaches using tools and 
resources specific to the laboratory;
- Design and development of an implementation in Choral;
- Evaluation of results on benchmark data;
- If possible, participation to an international evaluation campaign.

The PhD held in the LVIC premises at Nano Innov located in Palaiseau (near 
Ecole Polytechnique, Sup'Optique, Thales and Danone) in Paris area, France. It 
is directly funded by CEA.

Requirements:
 - Deadline for applications receipt: June, 25 2014
 - Master level in Computer Science with a NLP component
 - Bachelor, Master 1 and 2 or equivalent with honours (attach transcripts)

Send applications to:
mailto: gael.de-Chalendar at cea.fr

References:
[1] Flores, J. G., Ferret, O., & de Chalendar, G. Summarizing through sense 
concentration and Contextual Exploration rules: the CHORAL system at TAC 2009.
[2] Besançon, R. et al. (2010). “LIMA: A Multilingual Framework for Linguistic 
Analysis and Linguistic Resources Development and Evaluation”. In LREC 2010.
[3] Ji, H., Favre, B., Lin, W. P., Gillick, D., Hakkani-Tur, D., & Grishman, 
R. (2013). Open-Domain multi-document summarization via information 
extraction: Challenges and prospects. In Multi-source, Multilingual 
Information Extraction and Summarization (pp. 177-201). Springer Berlin 
Heidelberg.
[4] Munoz, R., & Atkinson, J. (2013). Rhetorics-based multi-document 
summarization. Expert Systems with Applications.
[5] Gohr, A., Spiliopoulou, M., & Hinneburg, A. (2013). Visually Summarizing 
Semantic Evolution in Document Streams with Topic Table. In Knowledge 
Discovery, Knowledge Engineering and Knowledge Management (pp. 136-150). 
Springer Berlin Heidelberg.
[6] Ferret, O., Châar, S. L., & Fluhr, C. (2004). Filtrage pour la 
construction de résumés multi-documents guidée par un profil. Traitement 
automatique des langues, 45(1), 65-93.


-- 
Gael de Chalendar
CEA LIST
Laboratoire Vision et Ingénierie des Contenus
(Vision and Content Engineering Laboratory)

CEA SACLAY - NANO INNOV
BAT. 861
Point courier 173
91191 GIF SUR YVETTE

Tél.:+33.1.69.08.01.50 Fax:+33.1.69.08.01.15 
Email : Gael.D.O.T.de-Chalendar.A at T.cea.D.O.T.fr


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list