Conf: 5th Workshop on Building and Using Comparable Corpora at LREC 2012, Istanbul, 26 May 2012

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Fri May 11 21:01:15 UTC 2012

Date: Fri, 11 May 2012 11:01:15 +0200
From: "Reinhard Rapp" <reinhardrapp at>
Message-ID: <E4D65B80613943F4A49D72C7FAEC26A4 at medion>

 Apologies for multiple postings


  Call for Participation


  Language Resources for Machine Translation

  in Less-Resourced Languages and Domains

  Co-located with LREC 2012

  Lütfi Kirdar Istanbul Exhibition and Congress Centre

  Saturday, 26 May 2012

  Endorsed by

   * ACL SIGWAC (Special Interest Group on Web as Corpus)

   * FLaReNet (Fostering Language Resources Network)

   * META-NET (Multilingual Europe Technology Alliance)


WORKSHOP PROGRAMME (formatted version see URL above)

Saturday, 26 May 2012

09:00 Opening

Oral Presentations 1: Multilinguality (Chair: Pierre Zweigenbaum)


09:10 Philipp Petrenz, Bonnie Webber: Robust Cross-Lingual Genre
Classification through Comparable Corpora

09:30 Qian Yu, François Yvon, Auréen Max: Revisiting sentence alignment
algorithms for alignment visualization and evaluation

Invited Project Session (Chair: Serge Sharoff)


09:50 Inguna Skadina: Analysis and Evaluation of Comparable Corpora for
Under-Resourced Areas of Machine Translation (ACCURAT,

10:10 Andrejs Vasiljevs: LetsMT! - Platform to Drive Development and
Application of Statistical Machine Translation (LetsMT!,

10:30 Coffee Break

11:00 Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio
Toral, Victoria Arranz: Mining and Exploiting Domain-Specific Corpora in
the PANACEA Platform (PANACEA,

11:20 Adam Kilgarriff, George Tambouratzis: The PRESEMT Project

11:40 Béatrice Daille: Building Bilingual Terminologies from Comparable
Corpora: The TTC TermSuite (TTC,

12:00 Panel Discussion with Invited Speakers

12:30 Lunch Break

Oral Presentations 2: Building Comparable Corpora (Chair: Reinhard Rapp)


14:00 Aimée Lahaussois, Séverine Guillaume: A viewing and processing
tool for the analysis of a comparable corpus of Kiranti mythology

14:20 Nancy Ide: MultiMASC: An Open Linguistic Infrastructure for
Language Research

Poster Presentations with Booster Session (Chair: Marko Tadic)


14:40 Elena Irimia: Experimenting with Extracting Lexical Dictionaries
from Comparable Corpora for: English-Romanian language pair

14:45 Iustina Ilisei, Diana Inkpen, Gloria Corpas, Ruslan Mitkov:
Romanian Translational Corpora: Building Comparable Corpora for
Translation Studies

14:50 Angelina Ivanova: Evaluation of a Bilingual Dictionary Extracted
from Wikipedia

14:55 Quoc Hung-Ngo, Werner Winiwarter: A Visualizing Annotation Tool
for Semi-Automatical Building a Bilingual Corpus

15:00 Lene Offersgaard, Dorte Haltrup Hansen: SMT systems for
less-resourced languages based on domain-specific data

15:05 Magdalena Plamada, Martin Volk: Towards a Wikipedia-extracted
Alpine Corpus

15:10 Sanja Stajner, Ruslan Mitkov: Using Comparable Corpora to Track
Diachronic and Synchronic Changes in Lexical Density and Lexical

15:15 Dan Stefanescu: Mining for Term Translations in Comparable Corpora

15:20 George Tambouratzis, Michalis Troullinos, Sokratis Sofianopoulos,
Marina Vassiliou: Accurate phrase alignment in a bilingual corpus for
EBMT systems

15:25 Katerina VeselovskáNguy Giang Linh, Michal Novák Using
Czech-English Parallel Corpora in Automatic Identification of 'It'

15:30 Manuela Yapomo, Gloria Corpas, Ruslan Mitkov: CLIR- and
Ontology-Based Approach for Bilingual Extraction of Comparable Documents

15:35 Poster Session and Coffee Break (coffee from 16:00 - 16:30)

Oral Presentations 3: Lexicon Extraction and Corpus Analysis 

(Chair: Andrejs Vasiljevs)


16:30 Amir Hazem, Emmanuel Morin: ICA for Bilingual Lexicon Extraction
from Comparable Corpora

16:50 Hiroyuki Kaji, Takashi Tsunakawa, Yoshihoro Komatsubara: Improving
Compositional Translation with Comparable Corpora

17:10 Nikola Ljubesic, Spela Vintar, Darja Fiser: Multi-word term
extraction from comparable corpora by combining contextual and
constituent clues

17:30 Robert Remus, Mathias Bank: Textual Characteristics of
Different-sized Corpora

17:50 Wrapup discussion and end of the workshop       


  Reinhard Rapp, Universities of Mainz (Germany) and Leeds (UK)
  Marko Tadic,  University of Zagreb (Croatia)
  Serge Sharoff, University of Leeds (UK)
  Andrejs Vasiljevs, Tilde SIA, Riga (Latvia)
  Pierre Zweigenbaum, LIMSI, CNRS, Orsay, and ERTIM, INALCO, Paris (France)


* Srinivas Bangalore (AT&T Labs, USA)
* Caroline Barrière (National Research Council Canada) 
* Chris Biemann (Microsoft / Powerset, San Francisco, USA) 
* Lynne Bowker (University of Ottawa, Canada) 
* Hervé Déjean (Xerox Research Centre Europe, Grenoble, France) 
* Andreas Eisele (DFKI, Saarbrücken, Germany) 
* Rob Gaizauskas (University of Sheffield, UK) 
* Éric Gaussier (Université Joseph Fourier, Grenoble, France) 
* Nikos Glaros (ILSP, Athens, Greece) 
* Gregory Grefenstette (Exalead/Dassault Systemes, Paris, France) 
* Silvia Hansen-Schirra (University of Mainz, Germany) 
* Kyo Kageura (University of Tokyo, Japan) 
* Adam Kilgarriff (Lexical Computing Ltd, UK) 
* Natalie Kübler (Université Paris Diderot, France) 
* Philippe Langlais (Université de Montréal, Canada) 
* Tony McEnery (Lancaster University, UK) 
* Emmanuel Morin (Université de Nantes, France) 
* Dragos Stefan Munteanu (Language Weaver Inc., USA) 
* Lene Offersgaard (University of Copenhagen, Denmark) 
* Reinhard Rapp (Universities of Mainz, Germany, and Leeds, UK) 
* Sujith Ravi (Yahoo! Research, Santa Clara, CA, USA) 
* Serge Sharoff (University of Leeds, UK) 
* Michel Simard (National Research Council Canada) 
* Inguna Skadina (Tilde, Riga, Latvia) 
* Monique Slodzian (INALCO, Paris, France) 
* Benjamin Tsou (The Hong Kong Institute of Education, China) 
* Dan Tufis (Romanian Academy, Bucharest, Romania) 
* Justin Washtell (University of Leeds, UK) 
* Michael Zock (LIF, CNRS Marseille, France) 
* Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France) 

For further information, please contact

   Reinhard Rapp reinhardrapp (at) gmx (dot) de
   or Marko Tadic marko.tadic (at) ffzg (dot) hr

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list