[Corpora-List] ACL-Workshop: Information Extraction for Balto-Slavonic languages - 1st Call for Participation

Ralf Steinberger ralf.steinberger at jrc.it
Wed Jun 6 06:42:52 UTC 2007


 

1st Call for Participation

 

We cordially invite you to participate in the forthcoming

 

 

                          ACL Workshop

                      Prague, 29 June 2007

 

       Balto-Slavonic Natural Language Processing 2007

Special Theme: Information Extraction and Enabling Technologies

               <http://langtech.jrc.it/BSNLP2007/>
http://langtech.jrc.it/BSNLP2007/ 

 

 

 

There are over 400 million speakers of Balto-Slavonic (BS) languages
world-wide. As of 2007, almost a third of the 23 official European Union
languages belong to this group. For some BS-languages, there is a rich
linguistic heritage and Language Technology is rather advanced, but many
others lag behind. This is partly due to a lack of basic linguistic
resources, which unfortunately often leads to a linguistic brain-drain:
instead of working on their own BS languages, scientists develop methods and
tools for English or other widely spoken languages because resources for
these are freely available.

 

The objective of this ACL workshop, organised by the European Commission’s
Joint Research Centre (JRC), is to promote the work on Balto-Slavonic
languages, and especially work on Information Extraction, by helping
scientists to describe and share their resources and to describe their
efforts, hoping that the experiences of a few will be useful for many
others. 

 

The presentation subjects at the workshop (see the program at
http://langtech.jrc.it/BSNLP2007/m/program.html for details) will include:
Information Extraction (scenario template filling, Named Entity Recognition,
definition extraction), name lemmatisation, word-sense discrimination,
topical text segmentation, WordNet-related developments, morphological
corpus annotation, term extraction, and hybrid POS-tagging. Most of the
talks will address, at some point, the specificities of analysing BS
languages. 

 

The invited speaker, Adam Przepiórkowski from the Polish Academy of
Sciences, will give an overview of specific linguistic phenomena of Slavonic
languages. He will show how these specific features can make Information
Extraction sometimes harder and sometimes easier than in Germanic and
Romance languages. 

 

Organizing Committee:

 

European Commission, Joint <http://www.jrc.it/>  Research Centre, Language
Technology Group <http://langtech.jrc.it/> 

Jakub Piskorski 

Bruno Pouliquen 

Ralf Steinberger 

Hristo Tanev 

 

----------------------------------------------------------------------------
----

 

Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)

European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it) 

JRC-Acquis Multilingual Parallel Corpus (Version 3)

*       Freely available for research purposes.

*       22 languages: Bulgarian, Czech, Danish, German, Greek, English,
Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian,
Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

*       Altogether over 1 Billion words.

*       Sentence alignment for 210 language pairs (currently available for
version 2.2 only).

*       For more information and download, see
http://langtech.jrc.it/JRC-Acquis.html.

 

The JRC’s Language Technology group specialises in the development of highly
multilingual text analysis tools and in cross-lingual applications. Many
applications are accessible online, e.g.:

*       NewsExplorer <http://press.jrc.it/NewsExplorer/> : multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.

*       NewsBrief <http://press.jrc.it/> : breaking news detection and
display of the very latest thematic news from around the world; email
alerting (22+ languages).

*       MedISys <http://medusa.jrc.it/>  Medical Information System: latest
health-related news from around the world according to themes and diseases
(22+ languages).



More information about the Corpora mailing list