Open Position @ ELDA: Computational linguist for multilingual text processing
info at elda.org
info at elda.org
Tue Nov 16 16:09:57 UTC 2010
[Apologies for cross-postings]
In conjunction with the EuropeanCommissionJointResearchCenter,
ELDAoffers a 6-month position to produce an updated version of the
sentence-aligned multilingual parallel corpus JRC-Acquis
(http://langtech.jrc.ec.europa.eu/JRC-Acquis.html)
*Purpose of the work / Tasks:*
- Download multilingual EU documentation from a server via a dedicated
Java application
- Convert all documents to a standardised XML format
- Clean and pre-process the data by identifying specific text parts such
as document footers, lists of addresses and annexes
- Possibly: run off-the-shelf tools to sentence align the documents
- Carry out consistency checking of the data
- Produce statistics on the data
- Prepare the data for distribution
- Various Perl scripts to produce the first version of the corpus exist
and should be reused.
*Profile and required skills:*
- Degree or MSc in computer science, computational linguistics, natural
language processing or similar fields
- Good knowledge of Perl to read and change existing data processing
scripts.
- Java and SQL, to use the application accessing the EU's document database.
- XML and XSLT
- Proficiency in English
- At least passive knowledge of several of the 23 official EU languages
(see the JRC-Acquis page for details)
Salary: Commensurate with qualifications and experience.
Applications will be considered until the position is filled. The
position is based in Paris, France, with about one week at the European
Commission's Joint Research Centre (JRC) at Ispra in Northern Italy.
Candidates should have the citizenship (or residency papers) of a
European Union country.
Applicants should send (preferably via email) a cover letter addressing
the points listed above together with a curriculum vitae to:
Victoria Arranz
ELRA / ELDA
55-57, rue Brillat Savarin
75013 Paris
France
Fax : +33 1 43 13 33 30
Email : _job at elda.org <mailto:job at elda.org>_
For further information about ELRA/ELDA, see:
http://www.elda.org <http://www.elra.info/>
http://www.elra.info <http://www.elra.info/>
For further information about JRC, see:
http://langtech.jrc.ec.europa.eu
<http://langtech.jrc.ec.europa.eu/JRC-Acquis.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/hpsg-l/attachments/20101116/47766162/attachment.htm>
-------------- next part --------------
More information about the HPSG-L
mailing list