<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#ffffff" text="#000099">
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:HyphenationZone>21</w:HyphenationZone>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:ApplyBreakingRules/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:UseFELayout/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]--><!--[if !mso]><object
classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id=ieooui></object>
<style>
st1\:*{behavior:url(#ieooui) }
</style>
<![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Tableau Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";}
</style>
<![endif]-->[Apologies for cross-postings]
<p class="MsoNormal"><span lang="EN-GB"> In conjunction with the </span><span
lang="EN-GB">European</span><span lang="EN-GB"> </span><span
lang="EN-GB">Commission</span><span lang="EN-GB"> </span><span
lang="EN-GB">Joint</span><span lang="EN-GB"> </span><span
lang="EN-GB">Research</span><span lang="EN-GB"> </span><span
lang="EN-GB">Center</span><span lang="EN-GB">, </span><span
lang="EN-GB">ELDA</span><span lang="EN-GB"> offers a 6-month
position to produce an updated version of the sentence-aligned
multilingual
parallel corpus JRC-Acquis (<a
href="http://langtech.jrc.ec.europa.eu/JRC-Acquis.html">http://langtech.jrc.ec.europa.eu/JRC-Acquis.html</a>)</span></p>
<p class="MsoNormal"><span lang="EN-GB"> <b>Purpose of the work /
Tasks:</b></span></p>
<p class="MsoNormal"><span lang="EN-GB">- Download multilingual EU
documentation
from a server via a dedicated Java application<br>
- Convert all documents to a standardised
XML format<br>
- Clean and pre-process the data by
identifying specific text parts such as document footers, lists
of addresses
and annexes<br>
- Possibly: run off-the-shelf tools to
sentence align the documents<br>
- Carry out consistency checking of the
data<br>
- Produce statistics on the data<br>
- Prepare the data for distribution<br>
- Various Perl scripts to produce the first
version of the corpus exist and should be reused.</span></p>
<p class="MsoNormal"><b><span lang="EN-GB">Profile and required
skills:</span></b></p>
<p class="MsoNormal"><span lang="EN-GB">- Degree or MSc in computer
science,
computational linguistics, natural language processing or
similar fields<br>
- Good knowledge of Perl to read and change
existing data processing scripts.<br>
- Java and SQL, to use the application
accessing the EU’s document database.<br>
- XML and XSLT<br>
- Proficiency in English<br>
- At least passive knowledge of several of
the 23 official EU languages (see the JRC-Acquis page for
details)</span></p>
<p class="MsoNormal"><span lang="EN-GB">Salary: Commensurate with
qualifications
and experience.</span></p>
<p class="MsoNormal"><span lang="EN-GB">Applications will be
considered until the
position is filled. The position is based in </span><span
lang="EN-GB">Paris</span><span lang="EN-GB">, </span><span
lang="EN-GB">France</span><span lang="EN-GB">, with
about one week at the European Commission’s Joint Research
Centre (JRC) at
Ispra in </span><span lang="EN-GB">Northern Italy</span><span
lang="EN-GB">. Candidates should have the citizenship (or
residency papers) of a
European Union country. </span></p>
<p class="MsoNormal"><span lang="EN-GB">Applicants should send
(preferably via
email) a cover letter addressing the points listed above
together with a
curriculum vitae to:</span></p>
<p class="MsoNormal"><span style="">Victoria Arranz<br>
ELRA / ELDA<br>
55-57, rue Brillat
Savarin<br>
75013 Paris<br>
France<br>
Fax : +33 1 43 13 33
30<br>
Email : <u><span style="color: blue;"><a
href="mailto:job@elda.org">job@elda.org</a></span></u> </span></p>
<p class="MsoNormal"><span lang="EN-GB">For further information
about ELRA/ELDA, see:<br>
<a href="http://www.elra.info/">http://www.elda.org</a><br>
<a href="http://www.elra.info/">http://www.elra.info</a></span></p>
<span lang="EN-GB">For further information about JRC, see:</span><span
lang="EN-GB"><br>
<a href="http://langtech.jrc.ec.europa.eu/JRC-Acquis.html">http://langtech.jrc.ec.europa.eu</a></span>
</body>
</html>