[Corpora-List] JRC Workshop on Exploiting parallel corpora in up to 20 languages
Ralf Steinberger
ralf.steinberger at jrc.it
Wed Jun 15 08:03:53 UTC 2005
Call for contributions / Call for participation
Please distribute widely
- JRC Ispra (Northern Italy), 26-27 September 2005 (Monday and
Tuesday)
- Travel expenses and daily allowance will be reimbursed
- Focus on the participation from the new and future EU Member
States and their languages
- Parallel corpus in up to twenty languages of the 'Acquis
Communautaire' (AC)
- Workshop web page:
http://www.jrc.it/langtech/0509_EU-Enlargement-Workshop.html
The European Commission's Joint Research Centre (JRC) is going to hold a
workshop on the exploitation of parallel corpora available for the twenty
official EU languages and is seeking scientists who can actively contribute
by presenting tools and ideas. At the same time, we are looking for persons
who would like to participate in the workshop without giving a presentation.
We are particularly interested in dealing with the new EU languages (Czech,
Estonian, Hungarian, Latvian, Lithuanian, Maltese, Polish, Slovene, Slovak)
and in creating resources for these languages, as well as in
language-independent or knowledge-poor methods.
Applications and resources of interest include, but are not restricted to:
- sentence, phrase and word alignment
- term and collocation extraction
- generation of bilingual or multilingual dictionaries
- automatic thesaurus construction and word clustering
- information extraction
- training and tuning of Machine Learning systems for statistical
MT and more
- automatic classification methods using the Eurovoc thesaurus
- usage of the generated resources for real-life applications
- .
This workshop is part of the European Commission's effort to integrate
scientists from the new and future EU Member States
(http://www.jrc.cec.eu.int/enlargement/action2005/index.htm) into the
so-called European Research Area
(http://www.jrc.cec.eu.int/default.asp@sidsz=what_we_do
<http://www.jrc.cec.eu.int/default.asp@sidsz=what_we_do&sidstsz=european_res
earch_space.htm> &sidstsz=european_research_space.htm). For this reason, we
are particularly looking for participants from the new EU Member States,
from Candidate and Acceding Countries and from Potential Candidate Countries
(Western Balkans).
THE CORPUS
When the ten countries joined the European Union in 2004, they had to
translate and ratify an existing collection of about ten thousand legal EU
documents covering a large variety of subject areas. This document
collection is referred to as the 'Acquis Communautaire' (AC). The JRC has
collected large parts of this document collection and intends to exploit it
to build multilingual term dictionaries. Due to the fact that the AC (as
well as most other EC documents) has been classified according to the
multilingual subject domain classification system Eurovoc
(http://europa.eu.int/celex/eurovoc/), it should be possible to
automatically generate subject-specific term dictionaries. For some
information about the AC corpus, see:
Tomaz Erjavec, Camelia Ignat, Bruno Pouliquen & Ralf Steinberger
(2005).
Massive multilingual corpus compilation; Acquis Communautaire and
totale.
In: 2nd Language & Technology Conference: Human Language Technologies
as a Challenge for Computer Science and Linguistics (L&T'05). Poznań,
Poland, 21-23 April 2005. Available at
http://www.jrc.cec.eu.int/langtech/
PLACE AND DATE
We plan to hold the workshop on Thursday and Friday 22/23 September 2005 at
the Joint Research Centre in Ispra. Ispra is located at the Lago Maggiore
lake, about 60 km West of Milan. The nearest airport is Milano Malpensa. For
more details, see
http://www.jrc.cec.eu.int/langtech/WorkatJRC.html#JRC-Ispra.
EXPRESSIONS OF INTEREST
If you are interested in participating in this workshop, please send a
message to Ralf.Steinberger at jrc.it before 27 June. If you can give a
presentation, please attach an abstract of what you propose to present. If
you prefer to simply attend the workshop, please explain in a few lines why
you are interested in this workshop. We plan to let you know about your
acceptance by mid-July.
CONDITIONS OF REIMBURSEMENT
Participants giving a presentation and participants from the new EU Member
States (with or without presentation) will be reimbursed for the incurred
travel cost, and they will receive a daily allowance of 149 Euro for each of
the two working days. Participants will have to pay for the hotel and for
all other expenses out of this daily allowance. About 30 participants will
be reimbursed, with a maximum of two persons per EU Member State. Additional
persons can participate at their own cost. Please note that the
reimbursement usually takes several months, but the JRC can pre-pay the
travel tickets.
CONTACT
For scientific-technical issues and requests for participation, please
contact Mr. Ralf Steinberger (Ralf.Steinberger at jrc.it
<mailto:Ralf.Steinberger at jrc.it?subject=JRC%20Enlargement%20Workshop> ).
For organisational issues, travel, accommodation, reimbursement, etc.,
another contact person will soon be announced on the workshop web page.
Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)
IPSC - SeS - Language Technology ( <http://www.jrc.it/langtech>
http://www.jrc.it/langtech)
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy
Tel: +39 0332 78 6271
Fax: +39 0332 78 5154
Secretary D. Negri: +39 0332 78 5648
More information about the Corpora
mailing list