[Corpora-List] Call for participation: LREC 2008 Workshop on Comparable Corpora

Mon May 19 07:05:02 UTC 2008

=============================================================
  Building and Using Comparable Corpora
  LREC 2008 Post-Conference Workshop

  May 31st, 2008, Marrakesh

  Program and Call for Participation
  http://www.limsi.fr/~pz/lrec2008-comparable-corpora/
=============================================================

  This workshop aims to bring together researchers interested in the
  constitution and use of comparable corpora. Contributions are
  presented on the constitution and application of comparable
  corpora, both by linguists and by computational linguists.

  We are pleased to announce that Dr Serge Sharoff, Centre for
  Translation Studies, School of Modern Languages and Cultures,
  University of Leeds, will give a special talk on "Parallel worlds:
  Recent advances in finding translations in comparable corpora"
  at the Workshop.

---------------------
PROGRAM

----------------
09:00 Welcome and Introduction
----------------

09:15-10:15 Oral Session 1: Some Challenges
09:15
  Gloria Corpas Pastor, Ruslan Mitkov, Naveed Afzal, Lisette Garcia Moya
  Translation universals: do they exist? A corpus-based and NLP approach
  to convergence

09:45
  Sanjika Hewavitharana, Stephan Vogel
  Enhancing a Statistical Machine Translation System by using an
  Automatically Extracted Parallel Corpus from Comparable Sources

----------------
10:15-11:00 Coffee break
10:15-11:00 Poster session 1 (see list of posters below)
----------------

11:00-12:30 Oral Session 2:
            Extracting Bilingual Lexicons from Comparable Corpora
11:00
  Iñaki Alegria, Nerea Ezeiza, Izaskun Fernandez
  Translating Named Entities using Comparable Corpora

11:30
  Pablo Gamallo Otero
  Evaluating Two Different Methods for the Task of Extracting Bilingual
  Lexicons from Comparable Corpora

12:00
  Xabier Saralegi, I. San Vicente, A. Gurrutxaga
  Automatic extraction of bilingual terms from comparable corpora in a
  popular science domain

12:30-13:30 Invited session

    Serge Sharoff (University of Leeds, UK)
    Parallel worlds: Recent advances in finding translations in
    comparable corpora

----------------
13h30-14:30 Lunch break
----------------

14:30-16:00 Oral session 3: Linguistic studies

14:30
  Christel Stolz, Thomas Stolz
  Functional-Typological Approaches To Parallel And Comparable
  Corpora: The Bremen Mixed Corpus

15:00
  Maria Fernanda Bacelar do Nascimento, Antónia Estrela, Amália
  Mendes, Luísa Pereira
  On the use of comparable corpora of African varieties of Portuguese
  for linguistic description and teaching/learning applications

15:30
  Oliver Culo, Silvia Hansen-Schirra, Stella Neumann, Mihaela Vela
  Empirical studies on language contrast using the English-German
  comparable and parallel CroCo corpus

----------------
16:00-16:45 Coffee break
16:00-16:45 Poster session 2 (see list of posters below)
----------------

16:45-18:00  Panel session

  Comparable corpora: varying definitions, varying uses

18:00 End of workshop

----------------
** List of poster presentations
----------------

Magnar Brekke
Term Extraction from Parallel and Comparable Text: The KB-N Legacy

Carmen Dayrell, Sandra Aluísio
Using a comparable corpus to investigate lexical patterning in English
abstracts written by non-native speakers

Meng Ji
A Comparative Approach to Diachronic Comparable Corpus Investigation

Natalie Kübler
A comparable Learner Translator Corpus: creation and use

Belinda Maia, Sérgio Matos
Corpógrafo V.4 -- tools for researchers and teachers using comparable
corpora

Emmanuel Prochasson, Kyo Kageura, Emmanuel Morin, Akiko Aizawa
Looking for Transliterations in a trilingual English, French and
Japanese Specialised Comparable Corpus

Richard Rohwer, Zhiqiang (John) Wang
Coarse Lexical Translation with no use of Prior Language Knowledge

----------------
Workshop Description

  Research in comparable corpora is motivated by the scarcity of
  parallel corpora. Parallel corpora are a key resource to mine
  translations for statistical machine translation or for building
  or extending bilingual lexicons and terminologies. However, beyond
  a few language pairs such as English-French or English-Chinese and
  a few contexts such as parliamentary debates or legal texts, they
  remain a scarce resource, despite the creation of automated
  methods to collect parallel corpora from the Web. A more
  fundamental limitation is that translated texts, whatever the
  skills of translators, are generally influenced by the very
  translation process and by the language of source texts, so that
  they may not be fully adequate for the task at hand.

  This has motivated research into the use of comparable corpora:
  pairs of monolingual corpora selected according to the same set of
  criteria, but in different languages or language
  varieties. Comparable corpora overcome the two limitations of
  parallel corpora, since sources for original, monolingual texts
  are much more abundant than translated texts. However, because of
  their nature, mining translations in comparable corpora is much
  more challenging than in parallel corpora. What constitutes a good
  comparable corpus, for a given task or per se, also requires
  specific attention: while the definition of a parallel corpus is
  fairly straightforward, building a comparable corpus requires
  control over the selection of source texts in both languages.

----------------
Workshop Organisers

Pierre Zweigenbaum
    LIMSI, CNRS, Orsay, France
    & ERTIM, INALCO, Paris, France
Eric Gaussier
    LIG, Université J. Fourier, Grenoble, France 
Pascale Fung
    Department of Electronic & Computer Engineering,
    University of Science & Technology, Hong Kong

Scientific Committee

Lynne Bowker (University of Ottawa, Canada)
Hervé Déjean (Xerox Research Centre Europe, Grenoble, France)
Éric Gaussier (Université Joseph Fourier, Grenoble, France)
Gregory Grefenstette (CEA/LIST, Fontenay-aux-Roses, France)
Pascale Fung (University of Science & Technology, Hong Kong)
Natalie Kübler (Université Paris Diderot, France)
Tony McEnery (Lancaster University, UK)
Emmanuel Morin (Université de Nantes, France)
Dragos Stefan Munteanu (Information Sciences Institute, Marina Del Rey, USA)
Carol Peters (ISTI-CNR, Pisa, Italy)
Reinhard Rapp (Johannes Gutenberg-Universität Mainz, Germany)
Serge Sharoff (University of Leeds, UK)
Monique Slodzian (INALCO, Paris, France)
Richard Sproat (University of Illinois at Urbana-Champaign, USA)
Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora