Job: Open Trainee positions at the EC's Joint Research Centre in Italy

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Jul 20 20:33:47 UTC 2007

Date: Thu, 19 Jul 2007 10:49:02 +0200
From: Ralf Steinberger <ralf.steinberger at>
Message-id: <007201c7c9e1$a7fb0920$d547bf8b at IPSC.TLD>

Apologies for multiple postings!


The European Commission's Joint Research Centre (JRC) is advertising
scientific trainee positions in a large variety of fields, including
three profiles related to text analysis. IPSC/G02-5/2007
Web Mining and Information Extraction IPSC/G02-6/2007
Multilingual text analysis tools IPSC/G02-7/2007
Political scientist


For the full call, see Below, you find information
on the profile 'Multilingual text analysis tools'.



Location: Ispra, at the Lago Maggiore in Italy, 60 km West of Milan;

Host: European Commission - Joint Research Centre (JRC)

Position: traineeship / internship / stage / Praktikum / tirocino;

Starting date: late 2007 or 2008;

Duration: 3 to 12 months;

Remuneration: 963 Euro per month + travel allowance;

Nationality: Applicants must have the nationality of an EU Member
State, of an

                  Associated EU Candidate Country, an Associated State
or a Developing Country;

Working language: English;

Activity: Language Technology, Web Technology; many other subject


Deadline: Open call. First cut-off date: Tuesday 14 September 2007




The European Commission's Joint Research Centre in Italy is seeking
for students or recent graduates to spend an internship with our
motivated and successful multinational team of scientists and
developers producing concrete and widely used applications. Successful
applicants will want to produce hands-on results and to work in a
team. The trainees will learn about our multilingual text analysis
tools (covering between 19 and 32 languages) and their integration
into complex and highly used web portals: our news analysis pages are
visited with up to 1.2 Million hits per day. The trainees will also
get experience of working in the multilingual, multinational,
multi-disciplinary environment of an international organisation.


Depending on your profile, you can expect to work on one or more of
the following subject areas:


- Information Extraction: named entities, relations, event scenarios,

- Symbolic or statistical approaches;

- Writing English event and relation extraction rules;

- Document Clustering, Categorisation (Classification;

- Terminology extraction, multilingual lexicology;

- Social networks;

- Visualisation;

- Topic detection and tracking, Trend detection;

- Adapting the JRC's tool set to new languages;

- Web log analysis for our applications;

- Applying text analysis tools to the medical or political domains;

- Mining the NewsExplorer <> name

- JAVA re-implementation of PERL programs;

- ...


Applicants must have good programming skills in JAVA or PERL and must
be able to use English as a working language.


Experience with one or more of the following would be a plus:
databases, web technology, XML, knowledge of several natural languages
(even passive), knowledge of - or interest in - medicine or political
science, experience of working with thesauri, ontologies,


If you are interested in this opportunity and you feel that you can
contribute to any of the tasks mentioned above, please follow the
instructions given at Please
carbon-copy your email application to Ralf.Steinberger AT


For information on the European Commission's Joint Research Centre and
its Web and Language Technology group, see
. For more information on traineeships, cost of living, etc., see



Ralf Steinberger (Ralf.Steinberger AT 
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <>,  <> 


JRC-Acquis Multilingual Parallel Corpus (Version 3)

.  Freely available for research purposes.

.  22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian,
Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene
and Swedish.

.  Altogether over 1 Billion words.

.  Sentence alignment for 231 language pairs.

.  For more information and download, see


The JRC's Language Technology group specialises in the development of
highly multilingual text analysis tools and in cross-lingual
applications. Many applications are accessible online, e.g.:

. NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news
over time and across languages; trend analysis; collects information
about people from the news; social network detection.

. NewsBrief: breaking news detection and display
of the very latest thematic news from around the world; email alerting
(22+ languages).

. MedISys Medical Information System: latest
health-related news from around the world according to themes and
diseases (22+ languages).

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list