[Corpora-List] Open Position @ ELDA: Computational linguist for multilingual text processing

info at elda.org info at elda.org
Tue Nov 16 16:09:57 UTC 2010


[Apologies for cross-postings]

  In conjunction with the EuropeanCommissionJointResearchCenter, 
ELDAoffers a 6-month position to produce an updated version of the 
sentence-aligned multilingual parallel corpus JRC-Acquis 
(http://langtech.jrc.ec.europa.eu/JRC-Acquis.html)

*Purpose of the work / Tasks:*

- Download multilingual EU documentation from a server via a dedicated 
Java application
- Convert all documents to a standardised XML format
- Clean and pre-process the data by identifying specific text parts such 
as document footers, lists of addresses and annexes
- Possibly: run off-the-shelf tools to sentence align the documents
- Carry out consistency checking of the data
- Produce statistics on the data
- Prepare the data for distribution
- Various Perl scripts to produce the first version of the corpus exist 
and should be reused.

*Profile and required skills:*

- Degree or MSc in computer science, computational linguistics, natural 
language processing or similar fields
- Good knowledge of Perl to read and change existing data processing 
scripts.
- Java and SQL, to use the application accessing the EU's document database.
- XML and XSLT
- Proficiency in English
- At least passive knowledge of several of the 23 official EU languages 
(see the JRC-Acquis page for details)

Salary: Commensurate with qualifications and experience.

Applications will be considered until the position is filled. The 
position is based in Paris, France, with about one week at the European 
Commission's Joint Research Centre (JRC) at Ispra in Northern Italy. 
Candidates should have the citizenship (or residency papers) of a 
European Union country.

Applicants should send (preferably via email) a cover letter addressing 
the points listed above together with a curriculum vitae to:

Victoria Arranz
ELRA / ELDA
55-57, rue Brillat Savarin
75013 Paris
France
Fax : +33 1 43 13 33 30
Email : _job at elda.org <mailto:job at elda.org>_

For further information about ELRA/ELDA, see:
http://www.elda.org <http://www.elra.info/>
http://www.elra.info <http://www.elra.info/>

For further information about JRC, see:
http://langtech.jrc.ec.europa.eu 
<http://langtech.jrc.ec.europa.eu/JRC-Acquis.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101116/47766162/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list