Job: Open position at the EC's Joint Research Centre: multilingual text analysis

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Tue Feb 17 20:16:47 UTC 2009

Date: Tue, 17 Feb 2009 09:40:56 +0100
From: Ralf Steinberger <ralf.steinberger at>
Message-id: <0KF700491C4ALS20 at>

The European Commission’s Joint Research Centre (JRC) in Ispra, at the
Lago Maggiore in Northern Italy, has an opening for a three-year
position in multilingual text analysis (see below). Applicants will
either need to have completed a Ph.D. or have five years of relevant
post-graduate experience.

The JRC is running several public news aggregation and analysis web
portals (see and provides a number of
services to a wide range of international customers. A strong focus in
the JRC’s work is on multilinguality and on tools to provide
cross-lingual information access.

Applications (3-page application form, an updated CV in English and a
copy of your passport/ID card) should be submitted by e-mail to the
following e-mail address: JRC-IPSC-GRANTHOLDERS at by 15
March 2009 midnight CET.

According to the Vademecum for grant holders (see,
the remuneration is about 54,000 Euro/year plus allowances.


Automatic Multilingual Text Analysis II


Category: Category 30 (Requires Ph.D. or five years of relevant
post-graduate experience)

Duration: 36 months

Action: OPTIMA 

Remuneration and conditions: see Vademecum for grantholders

URL generic call:
specific post: ht

The Internet is the richest reservoir of human knowledge that has ever
existed. Advanced software tools are needed to monitor and process the
vast amount of material available on-line. The Action OPTIMA
(OPensource Text Information Mining and Analysis) develops innovative
solutions for retrieving and extracting information from the Internet
and from other Open Sources. It serves many Commission Services, EU
agencies and some member state authorities. The core of this action is
the Europe Media Monitor (EMM).

In this action, the person will be working on research activities on
automatic multilingual text analysis. Typical examples of subjects
currently being studied are automatic event extraction, automatic
entity recognition and cross-language clustering.

These techniques are already to some extent being deployed in several
operational applications and part of the work would be in support of
these applications. The on-going research has a strong focus on
applicability in a multilingual environment

The work is highly practical and goal oriented. Research results are
expected to be used operationally. The candidate is expected to
contribute to scientific publications of the research results.

The system within which the results will be deployed is implemented in
Java as a set of servlets in Tomcat. Good programming skills,
preferably in Java are therefore recommended.

University degree in computer science or computational linguistics.

Doctoral degree in similar discipline, or equivalent work experience
of 5 years. The working language of the action is English and strong
English language skills are required. Given the multilingual aspect of
the work, active knowledge of at least one other language and an
understanding of at least another one is also required.

Good knowledge of Arabic, Farsi or Chinese would be seen as an asset.

Duration : 36 months

Ralf Steinberger (Firstname.Lastname at Commission -
Joint Research Centre (JRC)IPSC - SeS - OPTIMA (OPensource Text
Information Mining and Analysis)URL: Applications: The science behind them:

The JRC’s Language Technology activity specialises in the development
of highly multilingual text analysis tools and in cross-lingual
applications. Many applications are accessible online, e.g.:

  NewsExplorer: multilingual news aggregation and analysis (19
  languages); allows to navigate the news over time and across
  languages; trend analysis; collects information about people from
  the news; social network detection.

  NewsBrief: breaking news detection and display of the very latest
  thematic news from around the world; email alerting (40+ languages).

  MedISys Medical Information System: latest health-related news from
  around the world according to themes and diseases (40+ languages).

  EMM-Labs: Latest developments; social networks; live
  people-in-the-news; country and theme fact sheets; maps showing
  violent events world-wide.

JRC-Acquis Multilingual Parallel Corpus (Version 3)

 Freely available for research purposes.

 22 languages: Bulgarian, Czech, Danish, German, Greek, English,
 Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian,
 Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak,
 Slovene and Swedish.

 Altogether over 1 Billion words.

 Sentence alignment for 231 language pairs, using the two alternative
 aligners Vanilla and HunAlign.

 For more information and download, see

DGT-Translation Memory

 Freely available for research purposes.

 Aligned translation units for 231 language pairs.

 Alignment manually verified.

 For more information and download, see

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list