Conf: Information Extraction for Balto-Slavonic languages, ACL-Workshop

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Jun 8 15:35:45 UTC 2007

Date: Wed, 06 Jun 2007 08:42:52 +0200
From: Ralf Steinberger <ralf.steinberger at>

1st Call for Participation

We cordially invite you to participate in the forthcoming

                         ACL Workshop

                     Prague, 29 June 2007

      Balto-Slavonic Natural Language Processing 2007

Special Theme: Information Extraction and Enabling Technologies


There are over 400 million speakers of Balto-Slavonic (BS) languages
world-wide. As of 2007, almost a third of the 23 official European
Union languages belong to this group. For some BS-languages, there is
a rich linguistic heritage and Language Technology is rather advanced,
but many others lag behind. This is partly due to a lack of basic
linguistic resources, which unfortunately often leads to a linguistic
brain-drain: instead of working on their own BS languages, scientists
develop methods and tools for English or other widely spoken languages
because resources for these are freely available.

The objective of this ACL workshop, organised by the European
Commission\u2019s Joint Research Centre (JRC), is to promote the work
on Balto-Slavonic languages, and especially work on Information
Extraction, by helping scientists to describe and share their
resources and to describe their efforts, hoping that the experiences
of a few will be useful for many others.

The presentation subjects at the workshop (see the program at for details) will
include: Information Extraction (scenario template filling, Named
Entity Recognition, definition extraction), name lemmatisation,
word-sense discrimination, topical text segmentation, WordNet-related
developments, morphological corpus annotation, term extraction, and
hybrid POS-tagging. Most of the talks will address, at some point, the
specificities of analysing BS languages.

The invited speaker, Adam Przepiórkowski from the Polish Academy of
Sciences, will give an overview of specific linguistic phenomena of
Slavonic languages. He will show how these specific features can make
Information Extraction sometimes harder and sometimes easier than in
Germanic and Romance languages.

Organizing Committee:

European Commission, Joint Research Centre, Language Technology Group

Jakub Piskorski

Bruno Pouliquen

Ralf Steinberger

Hristo Tanev


Ralf Steinberger (Ralf.Steinberger at
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( 

JRC-Acquis Multilingual Parallel Corpus (Version 3)

· Freely available for research purposes.

· 22 languages: Bulgarian, Czech, Danish, German, Greek, English,
  Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian,
  Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak,
  Slovene and Swedish.

· Altogether over 1 Billion words.

· Sentence alignment for 210 language pairs (currently available for
  version 2.2 only).

· For more information and download, see

The JRC's Language Technology group specialises in the development of
highly multilingual text analysis tools and in cross-lingual
applications. Many applications are accessible online, e.g.:

· NewsExplorer: multilingual news aggregation and analysis (19
  languages); allows to navigate the news over time and across
  languages; trend analysis; collects information about people from
  the news; social network detection.

· NewsBrief: breaking news detection and display of the very latest
  thematic news from around the world; email alerting (22+ languages).

· MedISys Medical Information System: latest health-related news from
  around the world according to themes and diseases (22+ languages).

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list