[Corpora-List] Open position at the EC's Joint Research Centre: writing IE grammars

Ralf Steinberger ralf.steinberger at jrc.it
Tue Apr 7 12:44:02 UTC 2009


 

The European Commission's Joint Research Centre (
<http://ec.europa.eu/dgs/jrc/index.cfm> JRC) in Ispra, at the Lago Maggiore
in Northern Italy, has another opening for a three-year position in
multilingual text analysis (see below). Applicants will either need to have
completed a Ph.D. or have five years of relevant post-graduate experience.

 

The JRC is running several public news aggregation and analysis web portals
(see  <http://emm.jrc.it/overview.html> http://emm.jrc.it/overview.html) and
provides a number of services to a wide range of international customers. A
strong focus in the JRC's work is on multilinguality and on tools to provide
cross-lingual information access.

 

Applications (3-page
<http://ipsc.jrc.ec.europa.eu/job/appl_form_grantholders.xls> application
form, an updated CV in English and a copy of your passport/ID card) should
be submitted by e-mail to the following e-mail address:
<mailto:JRC-IPSC-GRANTHOLDERS at ec.europa.eu>
JRC-IPSC-GRANTHOLDERS at ec.europa.eu by 30 April 2009 midnight CET. 

 

According to the Vademecum for grant holders (see
<http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.p
df>
http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.pd
f), the remuneration is about 54,000 Euro/year plus allowances. 

 

----------------------------------------------------------------------------
--

 


Multilingual text analysis - writing grammars for information extraction


 

CALL REFERENCE NO. : IPSC/G02/12 

Category: Category 30 (Requires Ph.D. or five years of relevant
post-graduate experience)

Duration: 36 months

Action: OPTIMA 

Remuneration and conditions: see
<http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.p
df> Vademecum for grant holders

URL generic call:  http://ipsc.jrc.ec.europa.eu/jobs.php?id=8
<http://ipsc.jrc.ec.europa.eu/jobs.php?id=8%0b> 
URL specific post: http://ipsc.jrc.ec.europa.eu/showgrant.php?id=124 

 

 


CALL REFERENCE NO. 

:

IPSC/G02/12


Category 

:

30. Post-Doc researcher


Action 

:

 <http://ipsc.jrc.ec.europa.eu/showaction.php?id=18> OPTIMA


Application must be delivered before 30 Apr, 2009 - 23:59:59CET

 

The Internet is the richest reservoir of human knowledge that has ever
existed. Advanced software tools are needed to monitor and process the vast
amount of material available on-line. The Action OPTIMA (OPensource Text
Information Mining and Analysis) develops innovative solutions for
retrieving and extracting information from the Internet, and especially from
online news and blogs. It serves many Commission Services, EU agencies and
some EU Member State authorities. The core of this action is the Europe
Media Monitor (EMM). 

 

Examples of current work are automatic sentiment analysis, multilingual
multi-document summarisation, event extraction, automatic entity recognition
and name variant mapping, as well as various cross-language applications.
Rule-based, as well as Machine Learning and hybrid methods are being used to
achieve these goals. 

 

These techniques are already to some extent being deployed in several
operational applications (see http://press.jrc.it/overview.html) and part of
the work would be in support of these applications. The on-going research
has a strong focus on applicability in a multilingual environment. The work
is highly practical and goal-oriented. Research results are expected to be
used operationally. The candidate is expected to contribute to scientific
publications of the research results. 

 

The person we are looking for will be working on research activities in the
field of automatic multilingual text analysis. We are specifically looking
for somebody with experience in writing robust grammars for information
extraction, to complement machine learning work in this area. A large part
of the candidate's work will be to help to write information extraction
patterns, either from scratch or by rewriting and generalising automatically
learned patterns. 

 

The system within which the results will be deployed is implemented in Java
as a set of servlets in Tomcat. Good programming skills, preferably in Java
are therefore recommended. 

 

University degree in computational linguistics, computer science or related
areas. 

 

Doctoral degree in a similar discipline, or equivalent work experience of 5
years. The working language of the action is English and strong English
language skills are therefore required. Given the multilingual aspect of the
work, active knowledge of at least one other language and an understanding
of at least one more is also required. 

 

Good knowledge of Arabic, Farsi or Chinese would be seen as an asset. 

 

Duration : 36 months

 

 

 

Ralf Steinberger (Firstname.Lastname at jrc.it)
European Commission - Joint Research Centre (JRC)
IPSC - SeS - OPTIMA (OPensource Text Information Mining and Analysis)
URL: Applications: http://press.jrc.it/overview.html
URL: The science behind them:  <http://langtech.jrc.it/>
http://langtech.jrc.it.

The JRC's Language Technology activity specialises in the development of
highly multilingual text analysis tools and in cross-lingual applications.
Many applications are accessible online, e.g.:

.        <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.

.        <http://press.jrc.it/> NewsBrief: breaking news detection and
display of the very latest thematic news from around the world; email
alerting (40+ languages).

.        <http://medusa.jrc.it/> MedISys Medical Information System: latest
health-related news from around the world according to themes and diseases
(40+ languages).

.       EMM-Labs <http://emm-labs.jrc.it/> : Latest developments; social
networks; live people-in-the-news; country and theme fact sheets; maps
showing violent events world-wide.

                                                       

JRC-Acquis Multilingual Parallel Corpus (Version 3)

.    Freely available for research purposes.

.    22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian,
Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

.    Altogether over 1 Billion words.

.    Sentence alignment for 231 language pairs, using the two alternative
aligners Vanilla and HunAlign.

.    For more information and download, see
<http://langtech.jrc.it/JRC-Acquis.html>
http://langtech.jrc.it/JRC-Acquis.html.

 


DGT-Translation Memory

.       Freely available for research purposes.

.       Aligned translation units for 231 language pairs.

.       Alignment manually verified.

.       For more information and download, see
http://langtech.jrc.it/DGT-TM.html.

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090407/c491f4a9/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list