Job: Open Position, Computational linguist for multilingual text, ELDA

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Nov 17 09:47:30 UTC 2010

Date: Tue, 16 Nov 2010 17:09:57 +0100
From: info at
Message-ID: <4CE2ACD5.8070605 at>

[Apologies for cross-postings]

  In conjunction with the EuropeanCommissionJointResearchCenter,
ELDAoffers a 6-month position to produce an updated version of the
sentence-aligned multilingual parallel corpus JRC-Acquis

*Purpose of the work / Tasks:*

- Download multilingual EU documentation from a server via a dedicated
  Java application
- Convert all documents to a standardised XML format
- Clean and pre-process the data by identifying specific text parts
  such as document footers, lists of addresses and annexes
- Possibly: run off-the-shelf tools to sentence align the documents
- Carry out consistency checking of the data
- Produce statistics on the data
- Prepare the data for distribution
- Various Perl scripts to produce the first version of the corpus
  exist and should be reused.

*Profile and required skills:*

- Degree or MSc in computer science, computational linguistics,
  natural language processing or similar fields
- Good knowledge of Perl to read and change existing data processing
- Java and SQL, to use the application accessing the EU's document
- XML and XSLT
- Proficiency in English
- At least passive knowledge of several of the 23 official EU
  languages (see the JRC-Acquis page for details)

Salary: Commensurate with qualifications and experience.

Applications will be considered until the position is filled. The
position is based in Paris, France, with about one week at the
European Commission's Joint Research Centre (JRC) at Ispra in Northern
Italy.  Candidates should have the citizenship (or residency papers)
of a European Union country.

Applicants should send (preferably via email) a cover letter
addressing the points listed above together with a curriculum vitae

Victoria Arranz
55-57, rue Brillat Savarin
75013 Paris
Fax : +33 1 43 13 33 30
Email : _job at <mailto:job at>_

For further information about ELRA/ELDA, see:

For further information about JRC, see: 

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list