29.2546, Internships: Computational Linguistics: Trainee on 'Multilingual Entity-centric Event Extraction', EC's Joint Research Centre (JRC)

The LINGUIST List linguist at listserv.linguistlist.org
Fri Jun 15 15:40:53 UTC 2018


LINGUIST List: Vol-29-2546. Fri Jun 15 2018. ISSN: 1069 - 4875.

Subject: 29.2546, Internships: Computational Linguistics: Trainee on 'Multilingual Entity-centric Event Extraction', EC's Joint Research Centre (JRC)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================


Date: Fri, 15 Jun 2018 11:40:18
From: Vanni Zavarella [vanni.zavarella at ec.europa.eu]
Subject: Computational Linguistics: Trainee on 'Multilingual Entity-centric Event Extraction', EC's Joint Research Centre (JRC), Italy

 University or Organization: European Commission - Joint Research Centre (Jrc)
Web Address: http://emm.newsbrief.eu/overview.html

Type of Work: Natural Language Processing

Linguistic Field(s): Computational Linguistics

Voluntary
Internship Location: Ispra, VA, Italy
Minimum Education Level: BA
Special Qualifications: Essential:

- University degree in Computational/Formal Linguistics, Computer Science or related areas
- Java programming skills
- knowledge of Machine Learning
- good working knowledge of English (B2 level)


Advantage:

- Knowledge of further foreign languages
- the proven advanced programming skills, especially in Java
- good knowledge of Language Technology-related tools and methods, in particular in the area of Information Extraction
- The proven ability to work independently and as part of a team 


Description:
The Text and Data Mining Unit (I3) of the European Commission’s Joint Research
Centre (JRC) in Ispra, Italy, is looking for a trainee to support the JRC’s
Europe Media Monitor (EMM) team in its effort to develop a general-purpose
application that is able to scan large text collections of various types in
order to compute time-ordered series of open-domain events involving a target
entity such as persons or organisations. More precisely, the task focuses on:

- identification of all occurrences of a target entity in text collections
(e.g., on-line news, search engine results, social media), including named
mentions and mentions of entities that embrace the target entity
- identification of event triggers (relevant verb and noun phrases) involving
the target entity  
- classification and labelling at various levels of abstraction of the
detected events
- assignment of time references to the events the target entity participated
in, and 
- provision of intelligent filtering tools and visualisation of the event time
series. 

As of now, a prototype of such entity-centric event extraction tool for
processing text collections in English has been built, while the future work
will embrace extensions to: cover more languages, improve the overall
accuracy, cover new sources of information, merge information across documents
and languages, etc.

In particular, Open Information Extraction and Knowledge Harvesting techniques
are used to tackle multi-linguality and scalability, these ones being the two
most important design criteria in this context.

The EMM team develops various applications for gathering, aggregating and
analysing information from a wide range of sources, including for instance
on-line news (NewsBrief, MediSys), search engine results (OSINT Suite) and
social media. Methods used are mostly hybrid: machine learning tools are used
to gather evidence, learn vocabulary and patterns, but the results are usually
controlled and optimised through human intervention. EMM applications are used
by European Institutions, by national authorities in EU Member States, by
international organisations and by the public. EMM is part of the JRC’s
Competence Centre on Text Mining and Analysis
(https://ec.europa.eu/jrc/en/text-mining-and-analysis).

The successful trainee will contribute to the further development of the
entity-centric event extraction tool which will encompass adapting the tool to
process new languages (acquisition of language-specific resources) and/or
improving the existing ones and devising new methods for open information
extraction. The trainee is also expected to contribute to writing a scientific
publication on the work carried out.


Application Deadline: 29-Jun-2018

Web Address for Applications:http://recruitment.jrc.ec.europa.eu/
Contact Information:
	Vanni Zavarella
	Email: vanni.zavarella at ec.europa.eu



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:

              The IU Foundation Crowd Funding site:
       https://iufoundation.fundly.com/the-linguist-list

               The LINGUIST List FundDrive Page:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-29-2546	
----------------------------------------------------------






More information about the LINGUIST mailing list