[Corpora-List] 2 open trainee positions in 'Multilingual Text Mining and Evaluation'

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Thu Mar 4 14:05:16 UTC 2010


The Joint Research Centre of the European Commission in Ispra, Northern
Italy, has two open traineeship positions for Computational Linguists. 
 
For one of them, a Turkish speaker would be best suited, as the trainee's
task would be to improve the Turkish language coverage of our multilingual
news analysis applications, which are publicly available via
http://press.jrc.it/overview.html . 
 
If you are interested, please read about the conditions described on the
page http://ipsc.jrc.ec.europa.eu/jobs.php?id=7 and follow the application
procedure described there. 
 
Below follows my summary of the text given in the official call for
interest.
 

Multilingual Text Mining and Evaluation (2 positions)
Duration: 3-12 months
Monthly allowance: 963 Euro 
Application deadline: 28 March 2010 midnight CET
URLs:      <http://ipsc.jrc.ec.europa.eu/showtrainee.php?id=216>
http://ipsc.jrc.ec.europa.eu/showtrainee.php?id=216 
http://ipsc.jrc.ec.europa.eu/showtrainee.php?id=217 


Research and development efforts in the Open Source Text Information Mining
and Analysis group (OPTIMA) produce novel and unique approaches and software
that gather and analyse an average of 100,000 media reports per day from
online news portals world-wide in 50 languages. The tools classify according
to subject domains, cluster related articles, summarise the news clusters,
extract information from them (named entities, sentiment, etc.), aggregate
the extracted information, track topics over time, issue breaking news
alerts and produce visual presentations of the information found. See
http://emm.newsbrief.eu/overview.html to access our public Europe Media
Monitor (EMM) portals. 


The selected persons will be members of an international and highly
motivated team of researchers and developers. They will learn about the
inner working of some of the most highly multilingual text analysis
applications world-wide, and they are likely to become co-authors of
scientific publications on the applications they work on. 


The successful candidates will be asked to contribute to the group effort by
working on one or more of the following tasks: 


*	Improving Turkish news gathering and text analysis; 
*	Evaluating the output of the current multilingual tools and helping
to improve them; 
*	Extending current lexical resources (e.g. for sentiment analysis and
information extraction), using semi-automatic methods, in one or more
languages; 
*	Producing gold-standard annotations for sentiment analysis (opinion
mining), multi-document summarisation, information extraction; 
*	Produce multilingual parallel corpora on the basis of existing
in-house document collections, which may include format conversion, data
cleaning, consistency checking and sentence alignment; 
*	Contribute to scientific publications (with co-authorship). 
 
Required profile:


*	University degree in Computational Linguistics or a related field,
either completed or near completion; 
*	Ability to work in a predominantly English-speaking team; 
*	At least passive knowledge of several natural languages; 
*	Programming skills, especially in Java; 
*	Willingness to contribute hands-on to produce working online
applications. 
 
Depending on the concrete task, one or more of the following skills are also
required:


.         Programming skills (especially Java and Perl, .); 
.         Knowledge of, and hands-on experience with, a variety of text
mining tools; 
*	Experience in lexicology and or text annotation; 
*	Experience with text data format conversion and the application of
sentence alignment software. 
 
 
Ralf Steinberger <http://langtech.jrc.ec.europa.eu/RS.html>  
European Commission - Joint Research Centre (JRC)
IPSC - GlobeSec - OPTIMA (OPensource Text Information Mining and Analysis)
URL - Applications: http://emm.jrc.it/overview.html 
URL - The science behind them: http://langtech.jrc.ec.europa.eu 
21027 Ispra (VA), Italy


 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100304/e06a6ec4/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list