Job: Stage master recherche, Laboratoire ERIC

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Sat Dec 1 19:53:32 UTC 2012

Date: Wed, 28 Nov 2012 15:46:39 +0100
From: Ah-Pine Julien <julien.ah-pine at>
Message-ID: <50B623CF.5070607 at>


Le laboratoire ERIC propose un stage de niveau master orienté recherche
d'une durée de 6 mois à partir de janvier 2013. Le sujet porte sur les
méthodes hybrides combinant les approches symboliques et les approches
statistiques dans le cadre de tâches en Traitement Automatique de la
Langue (reconnaissance d'entités nommées, détection d'opinions notamment
dans les médias sociaux).

Pour plus de détails et pour candidater merci de consulter le document 

Par ailleurs, merci de transférer ce message à des étudiants ou 
départements susceptibles d'être intéressés par l'offre.

Bien cordialement,

Combining rule-based and statistical-based techniques in Natural
Language Processing

Research internship at ERIC lab - 6 months starting from january 2013

1 Context

This internship proposal is in the context of the french funded project
Imagiweb The project is
concerned with the analysis of the representation that entities want to
disseminate about themselves through web 2.0 tools. The entities the
project target are of different kinds and could be politicians,
companies and so on. How do the recipients perceive these entities after
the statements they make through their posts, blogs and so on ? What are
the different types of representations/opinions about these entities
that emerge from social media data and from different communities ? In
order to address these general issues, the project proposes an approach
that mixes different domains such as sociology, semiology, knowledge
bases, natural language processing and text mining. The project gathers
six partners : one company, AMI software, one french group, EDF, one
american group, Xerox and three public research labs, CEPEL (in
political science), ERIC lab and LIA (in computer science).

2 Topic description

In this context, the ERIC lab is openning an
internship on the general topic \Combining rule-based and
statistical-based techniques in Natural Language Processing". The goal
of this work is to provide a state-of-the-art report on hybrid
techniques (mixing rule-based and statistical-based) used in NLP
tasks. Rule-based techniques in NLP generally have high precision but
rather poor recall whereas statistical- based methods rather have lower
precision but higer recall performances. Many attemps have been proposed
in order to combine both types of approaches in order to produce better
algorithms (see a non exhaustive list of references below). A more
particular attention will be paid to the following NLP tasks : named
entity recognition and opinion detection in social media data.

The recruited candidate will have to (i) review the main strategies
proposed in the literature for combining rule-based and
statistical-based methods; (ii) focus on the techniques that could suit
the data and tasks of the Imagiweb project and (iii) implement the
retained methods in order to conduct some experiments that compare the
different techniques. As a result, the implemented algorithms will be
tested on a dataset of tweets and/or blogs (in french) provided by other
partners in the project in order to help with the detection of entities
representations and recipients opinions.

This internship will be conducted in close relationship with Xerox

3 Requirements

Master degree in machine learning and/or computational linguistics. The
ideal candidates have prior work experiences with NLP tools, R and/or
Python and have a good understanding of french.

4 Salary

Around 436 euros / month

5 Contact and application

To apply, please send an email to Julien Ah-Pine
mailto:julien.ah-pine at and Julien Velcin
mailto:julien.velcin at with a resume, a cover letter,
the candidate's grades and the name of two references we can contact for
recommendation letters.


[1] Julien Ah-Pine and Guillaume Jacquet. Clique-based clustering for
improving named entity recognition systems. In EACL, pages 51{59, 2009.

[2] Caroline Brun. Detecting opinions using deep syntactic analysis. In
Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, and Nicolas Nicolov,
editors, RANLP, pages 392{398. RANLP 2011 Organising Committee, 2011.

[3] Thierry Charnois, Marc Plantevit, Christophe Rigotti, and Bruno
Cremilleux. Fouille de données séquentielles pour l'extraction
d'information dans les textes. Revue Traitement Automatique des Langues
(TAL), 50(3):59{87, December 2009.

[4] Dominic Rout Diana Maynard, Kalina Bontcheva. Challenges in
developing opinion mining tools for social media. In Proceedings of the
@NLP workshop associated with LREC 12, 2012.

[5] Ronen Feldman, Benjamin Rosenfeld, and Moshe Fresko. Teg&#x2014;a
hybrid approach to information extraction. Knowl. Inf. Syst., 9(1):1{18,
January 2006.

[6] Moshe Fresko, Binyamin Rosenfeld, and Ronen Feldman. A hybrid
approach to ner by memm and manual rules. In Proceedings of the 14th ACM
international conference on Information and knowledge management, CIKM
'05, pages 361{362, New York, NY, USA, 2005. ACM.

[7] Natalia Grabar, Marie Dupuch, Amandine Périnet, and Thierry
Hamon. Hybrid 2012, innovative hybrid approaches to the processing of
textual data. In Proceedings of Hybrid 12, workshop associated to EACL
12, 2012.

[8] Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. Emoticon smoothed language
models for twitter sentiment analysis. In AAAI, 2012.

[9] Xiaohua Liu, Shaodian Zhang, Furu Wei, and Ming Zhou. Recognizing
named entities in tweets. In Proceedings of the 49th Annual Meeting of
the Association for Computational Linguistics: Human Lan- guage
Technologies - Volume 1, HLT '11, pages 359{367, Stroudsburg, PA, USA,
2011. Association for Computational Linguistics.

[10] Xiaohua Liu, Ming Zhou, Furu Wei, Zhongyang Fu, and Xiangyang
Zhou. Joint inference of named entity recognition and normalization for
tweets. In Proceedings of the 50th Annual Meeting of the Association for
Computational Linguistics: Long Papers - Volume 1, ACL '12, pages
526{535, Stroudsburg, PA, USA, 2012. Association for Computational

[11] David Nadeau and Satoshi Sekine. A survey of named entity
recognition and classi
cation. Linguisticae Investigationes,
30(1):3{26, January 2007. Publisher: John Benjamins Publishing Company.

[12] Alexander Pak and Patrick Paroubek. Le microblogage pour la
microanalyse des sentiments et des opinions.  TAL, 51(3):75{100, 2010.

[13] Carlos Rodrìguez-Penagos, Jens Grivolla, and Joan Codina Fibà. A
hybrid framework for scalable opinion mining in social media: detecting
polarities and attitude targets. In Proceedings of the Workshop on Se-
mantic Analysis in Social Media, pages 46{52, Stroudsburg, PA, USA,
2012. Association for Computational Linguistics.

[14] Rohini Srihari, Cheng Niu, and Wei Li. A hybrid approach for named
entity and sub-type tagging. In Proceedings of the sixth conference on
Applied natural language processing, ANLC '00, pages 247{254,
Stroudsburg, PA, USA, 2000. Association for Computational Linguistics.

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list