Job: Open position, EC's Joint Research Centre, writing IE grammars

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Tue Apr 7 16:47:27 UTC 2009

Date: Tue, 07 Apr 2009 14:44:02 +0200
From: Ralf Steinberger <ralf.steinberger at>
Message-id: <00d601c9b77e$87c189e0$97449da0$%Steinberger at>

The European Commission's Joint Research Centre ( JRC) in Ispra, at the Lago
Maggiore in Northern Italy, has another opening for a three-year
position in multilingual text analysis (see below). Applicants will
either need to have completed a Ph.D. or have five years of relevant
post-graduate experience.


The JRC is running several public news aggregation and analysis web
portals (see and provides a number of
services to a wide range of international customers. A strong focus in
the JRC's work is on multilinguality and on tools to provide
cross-lingual information access.


Applications (3-page
application form, an updated CV in English and a copy of your
passport/ID card) should be submitted by e-mail to the following
e-mail address: JRC-IPSC-GRANTHOLDERS at by 30 April 2009
midnight CET.


According to the Vademecum for grant holders (see
), the remuneration is about 54,000 Euro/year plus allowances.



Multilingual text analysis - writing grammars for information extraction


Category: Category 30 (Requires Ph.D. or five years of relevant
post-graduate experience)

Duration: 36 months

Action: OPTIMA 

Remuneration and conditions: see>
Vademecum for grant holders

URL generic call:
URL specific post: 



Category :

30. Post-Doc researcher

Action : OPTIMA

Application must be delivered before 30 Apr, 2009 - 23:59:59CET

The Internet is the richest reservoir of human knowledge that has ever
existed. Advanced software tools are needed to monitor and process the
vast amount of material available on-line. The Action OPTIMA
(OPensource Text Information Mining and Analysis) develops innovative
solutions for retrieving and extracting information from the Internet,
and especially from online news and blogs. It serves many Commission
Services, EU agencies and some EU Member State authorities. The core
of this action is the Europe Media Monitor (EMM).


Examples of current work are automatic sentiment analysis,
multilingual multi-document summarisation, event extraction, automatic
entity recognition and name variant mapping, as well as various
cross-language applications.  Rule-based, as well as Machine Learning
and hybrid methods are being used to achieve these goals.


These techniques are already to some extent being deployed in several
operational applications (see and
part of the work would be in support of these applications. The
on-going research has a strong focus on applicability in a
multilingual environment. The work is highly practical and
goal-oriented. Research results are expected to be used
operationally. The candidate is expected to contribute to scientific
publications of the research results.


The person we are looking for will be working on research activities
in the field of automatic multilingual text analysis. We are
specifically looking for somebody with experience in writing robust
grammars for information extraction, to complement machine learning
work in this area. A large part of the candidate's work will be to
help to write information extraction patterns, either from scratch or
by rewriting and generalising automatically learned patterns.


The system within which the results will be deployed is implemented in
Java as a set of servlets in Tomcat. Good programming skills,
preferably in Java are therefore recommended.


University degree in computational linguistics, computer science or
related areas.


Doctoral degree in a similar discipline, or equivalent work experience
of 5 years. The working language of the action is English and strong
English language skills are therefore required. Given the multilingual
aspect of the work, active knowledge of at least one other language
and an understanding of at least one more is also required.


Good knowledge of Arabic, Farsi or Chinese would be seen as an asset.


Duration : 36 months

Ralf Steinberger (Firstname.Lastname at
European Commission - Joint Research Centre (JRC)
IPSC - SeS - OPTIMA (OPensource Text Information Mining and Analysis)
URL: Applications:
URL: The science behind them:  <>

The JRC's Language Technology activity specialises in the development
of highly multilingual text analysis tools and in cross-lingual
applications.  Many applications are accessible online, e.g.:

.. NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news
over time and across languages; trend analysis; collects information
about people from the news; social network detection.

.. NewsBrief: breaking news detection and
display of the very latest thematic news from around the world; email
alerting (40+ languages).

.. MedISys Medical Information System: latest
health-related news from around the world according to themes and
diseases (40+ languages).

..  EMM-Labs : Latest developments; social
networks; live people-in-the-news; country and theme fact sheets; maps
showing violent events world-wide.


JRC-Acquis Multilingual Parallel Corpus (Version 3)

..  Freely available for research purposes.

..  22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian,
Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene
and Swedish.

..  Altogether over 1 Billion words.

..  Sentence alignment for 231 language pairs, using the two
alternative aligners Vanilla and HunAlign.

..  For more information and download, see

DGT-Translation Memory

..  Freely available for research purposes.

..  Aligned translation units for 231 language pairs.

..  Alignment manually verified.

..  For more information and download, see

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list