Job: Traineeship position for Polish Text Mining and Evaluation at the JRC

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Feb 15 15:47:22 UTC 2012

Date: Sun, 12 Feb 2012 12:18:13 +0100
From: Ralf Steinberger <ralf.steinberger at>
Message-id: <004001cce978$02faa5f0$08eff1d0$>

Readers on this list may be interested in the following traineeship
position to work on adapting multilingual text mining tools to the
Polish language. Feel free to pass on this message.

Ralf Steinberger

European Commission – Joint Research Centre (JRC)

Ispra, Italy


Call Reference Number: 2012-IPSC-16 - ISPRA
Title: Multilingual Text Mining and Evaluation (Polish)

Duration: 6 months

Location: Joint Research Centre (JRC), Ispra, Italy

URL on rules and conditions:

Application via staff recruitment application tool ESRA:

We are:

The mission of the Joint Research Centre (JRC) is to provide
customer-driven scientific and technical support for the conception,
development, implementation and monitoring of EU policies. Being a
Directorate- General of the European Commission, the JRC functions both
as the in-house science service of the Commission and as a reference
centre for science and technology for the Union. With 7 Scientific
Institutes, 3 Corporate Directorates and the DG/DDG Office, the JRC is
located in 5 Member States (Belgium, Germany, Italy, the Netherlands and
Spain). Further information is available at:

The current vacancy is in the Institute for the Protection and Security
of the Citizen (located in Ispra, Italy).

The Institute provides research results and supports EU policy-makers in
their effort towards global security and protection of European citizens
from accidents, deliberate attacks, fraud and illegal actions against EU
policies. More details on IPSC can be found at:

The vacancy is within the Global Security and Crisis Management Unit
(GlobeSec), in the OPTIMA Action (Open Source Text Information Mining
and Analysis). Research and development efforts in the OPTIMA group
produce novel and unique approaches and software that gather and analyse
an average of 100,000 media reports per day from online news portals
world-wide in 50 languages. The tools classify according to subject
domains, cluster related articles, summarise the news clusters, extract
information from them, aggregate the extracted information, track topics
over time, issue breaking news alerts and produce visual presentations
of the information found. See to
access the public Europe Media Monitor (EMM) portals.

We propose:

We propose a trainee position in Ispra, Italy.

We are looking for a person to help us analyse Polish language news and
social media posts, and specifically to help us adapt EMM’s multilingual
suite of text mining tools to the Polish language. EMM’s tools -
currently developed for up to 20 languages - include the following
functionality: Named Entity Recognition and disambiguation (persons,
organisation, locations, dates); co-reference resolution of definite
descriptions; quotation recognition; document clustering; document
categorisation using Boolean search expressions; multi-document
summarisation; Statistical Machine Translation.

Trainee Project Sheet

The selected person will be a member of an international and highly
motivated team of researchers and developers. They will learn about the
inner workings of some of the most highly multilingual text analysis
applications world-wide, and they are likely to become co-authors of
scientific publications on the applications they work on.

The successful candidate will be asked to contribute to the group effort by working on the following tasks:

· Creating lexical resources for Information Extraction, by using
  semi-automatic methods;

· Exploiting externally available dictionaries and corpora, which
  requires format conversion, data cleaning, consistency checking;

· Adapting the currently existing language-independent rule set to
  Polish, if necessary;

· Evaluating the output of the Polish text mining tools and helping to
  improve them;

· Possibly, producing gold-standard annotations for various information
  extraction tasks for evaluation purposes;

· Contribute to scientific publications (with co-authorship).

We look for:

We look for a candidate who fits the following description:

· University degree in Computational Linguistics or a related field,
  either completed or near completion;

· Hands-on Java programming skills;

· Knowledge of Polish morphology;

· Ability to work in a predominantly English-speaking team;

· Willingness to contribute hands-on to produce working online

One or more of the following skills would be an asset:

· Programming skills in a scripting language like Perl or Python;

· Knowledge of, and hands-on experience with, a variety of text mining

· Hands-on experience with using databases;

· Hands-on experience with using Polish linguistic resources;

· Experience in morphology, lexicology and or text annotation;

· Knowledge, even passive, of further natural languages;

· Experience with XML and with text data format conversion.


Mandatory language skills:

· For EU nationals: knowledge of at least 2 Community official
  languages, of which one should be English, French or German. Required
  2nd language level is B2 according to the Common European Framework of
  Reference for Languages.

· For non-EU nationals: very good knowledge of English, French or
  German. Required level of the language is C2 according to the Common
  European Framework of Reference for Languages.

· Other requirements are according to the Rules Governing the
  Traineeship Scheme of the Joint Research Centre.

In order to apply please follow directions next to the published call:

Please note that only online applications via the ESRA tool will be

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list