Job: Post-doc position in multilingual text processing at the EC's JRC

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Wed Apr 16 09:15:40 UTC 2008

Date: Wed, 16 Apr 2008 10:27:19 +0200
From: Ralf Steinberger <ralf.steinberger at>
Message-id: <009601c89f9b$affc8b90$d947bf8b at IPSC.TLD>

Application deadline is 18 April midnight CET! Please excuse the late

The European Commission’s Joint Research Centre (JRC
<> ) in Ispra, at the Lago
Maggiore in Northern Italy has an opening for a post-doc position in
multilingual text analysis (see below). The JRC is running several
public news aggregation and analysis web portals (see and provides a number of services to
a wide range of international customers. A strong focus in the JRC’s
work is on multilinguality and on tools to provide cross-lingual
information access.


Applications (3-page
application form and an updated
<> CV in
English) should be submitted by e-mail to the following e-mail


According to the Vademecum for grantholders (see,
the remuneration is about 54,000 Euro/year plus allowances.




Automatic Multilingual Text Analysis



Category: Post-Doc researcher (category 30)

Duration: 36 months

Action: EMM 

Remuneration: see Vademecum for grantholders

URL generic call:
URL specific post:

In the Web Mining and Intelligence (EMM) activity, the person will be
working on research activities on automatic multilingual text
analysis. Typical examples of subjects being studied currently are
automatic event extraction, automatic entity recognition and
cross-language clustering.

These techniques are already being deployed in several operational
applications and part of the work would be in support of these
applications. The on-going research has a strong focus on
applicability in a multilingual environment

A new area of research is the automatic generation of summaries from
multi-document texts, in particular from news article clusters. The
work is highly practical and goal oriented. Research results are
expected to be used operationally. The system within which the results
will be deployed is implemented in Java as a set of servlets in

University degree in computer science or computational
linguistics. Doctoral degree in similar discipline, or equivalent work
experience of 5 years. Good programming skills, preferably in Java are
therefore recommended. The working language of the action is English
and strong English language skills are required. Given the
multilingual aspect of the work, active knowledge of at least one
other language and an understanding of at least another one is also

Good knowledge of Arabic would be seen as an asset. 

Ralf Steinberger ( <mailto:Ralf.Steinberger at> Ralf.Steinberger at 
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology 
URL: Applications:
URL: The science behind them:  <>

The JRC’s Language Technology group specialises in the development of
highly multilingual text analysis tools and in cross-lingual
applications. Many applications are accessible online, e.g.:

* NewsExplorer: multilingual news
  aggregation and analysis (19 languages); allows to navigate the news
  over time and across languages; trend analysis; collects information
  about people from the news; social network detection.

* NewsBrief: breaking news detection and display
  of the very latest thematic news from around the world; email
  alerting (22+ languages).

* MedISys Medical Information System: latest
  health-related news from around the world according to themes and
  diseases (22+ languages).

* EMM-Labs : Latest developments; social
  networks; live people-in-the-news; country and theme fact sheets;
  maps showing violent events world-wide.


JRC-Acquis Multilingual Parallel Corpus (Version 3)

* Freely available for research purposes.

* 22 languages: Bulgarian, Czech, Danish, German, Greek, English,
  Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian,
  Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak,
  Slovene and Swedish.

* Altogether over 1 Billion words.

* Sentence alignment for 231 language pairs.

* For more information and download, see


DGT-Translation Memory

* Freely available for research purposes.

* Aligned translation units for 231 language pairs.

* Alignment manually verified.

* For more information and download, see

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list