[Corpora-List] HLT/NAACL-2003 Workshop CFP: Building and Using Parallel Texts: Data Driven Machine Translation & Beyond

Priscilla Rasmussen rasmusse at cs.rutgers.edu
Mon Feb 10 16:50:43 UTC 2003


		** Apologies for Multiple Postings! **

               -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                      C A L L   F O R   P A P E R S

                   Building and Using Parallel Texts:
                Data Driven Machine Translation and Beyond

                       An HLT-NAACL 2003 Workshop
                           Edmonton, Alberta
                         May 31 or June 1, 2003

  	            http://www.cs.unt.edu/~rada/wpt

               -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The goal of this workshop is to provide a forum for researchers working on
problems related to the creation and use of parallel text. Recent events
have demonstrated once again the importance of inter-language
communication, and reinforce the need for advances in machine translation
(MT) and multi-lingual processing tools.

The workshop will be centered around the problem of building and using
parallel corpora, which are vital resources for efficiently deriving
multi-lingual text processing tools. In addition to regular papers, the
workshop also includes a shared task  that will result in a comparative
evaluation of word alignment techniques.

We invite submissions of papers addressing any of the following issues:

- Construction of parallel corpora, including the automatic identification
and harvesting of parallel corpora from the Web.
- Methods to evaluate the quality of parallel corpora and word alignments
- Tools for processing parallel corpora, including automatic sentence
alignment, word alignment, phrase alignment, detection of omissions and
gaps in translations, and others
- Using parallel corpora for data driven Machine Translation
- Using parallel corpora for the derivation of language processing tools
in new languages
- Using parallel corpora for automatic corpora annotation
- Language learning applied to parallel corpora
- Translation memory systems as a source of aligned corpora

While we invite submissions addressing any of the above topics, or related
issues, we particularly welcome work involving parallel corpora addressing
languages with scarce resources.

We expect to make arrangements with a journal in Natural Language
Processing or Computational Linguistics for a special issue that will
include selected papers from this workshop.

Invited Speaker:
-=-=-=-=-=-=-=-=

Elliot Macklovitch, University of Montreal

Shared Task:
-=-=-=-=-=-=

All researchers who have a word alignment system available are invited
to participate in the shared task, individually or as part of a team.

Participants in the shared task will be provided with common sets of
training data, consisting of Romanian-English and French-English parallel
texts. Participants will be given approximately one month to train their
systems with this data, and then previously held out test data will be
released. Participants will run their alignment system on this test data
and submit their results, which will be evaluated using a common set of
metrics. See the workshop website for details regarding the shared task.


Submission format:
-=-=-=-=-=-=-=-=-=

Submissions should consist of regular full papers of max. 7 pages,
formatted following the NAACL 2003 guidelines. In addition, teams
participating in the word alignment shared task are invited to submit
short papers (max. 4 pages) describing their systems and/or evaluation
methodology.

Send your submission (a ps or pdf file), prepared for anonymous review,
to both:

Rada Mihalcea, University of North Texas, rada at cs.unt.edu
and
Ted Pedersen, University of Minnesota, Duluth, tpederse at d.umn.edu

Important dates:
-=-=-=-=-=-=-=-=

Deadline for regular paper submissions:  March 10
Deadline for results submissions:  March 25 (shared task)
Deadline for short paper submissions: April 1 (shared task)
Notification of acceptance for regular papers:  April 1
Deadline for camera-ready papers:  April 10

Organisation Committee:
-=-=-=-=-=-=-=-=-=-=-=-

Rada Mihalcea, University of North Texas
Ted Pedersen, University of Minnesota, Duluth

Program Committee:
-=-=-=-=-=-=-=-=-=

Lars Ahrenberg, Linkoping University
Nicoletta Calzolari, University of Pisa
Tim Chklovski, Massachusetts Institute of Technology
Mona Diab, University of Maryland
Ulrich Germann, Information Sciences Institute
Daniel Gildea, University of Pennsylvania
Maria das Gracas Volpe Nunes, University of Sao Paulo
Nancy Ide, Vassar College
Lucia Helena Machado Rino, Federal University of Sao Carlos
Eduard Hovy, University of Southern California / Information Sciences Institute
Philippe Langlais, University of Montreal
Elliot Macklovitch, University of Montreal
Daniel Marcu, University of Southern California / Information Sciences Institute
Dan Melamed, New York University
Magnus Merkel, Linkoping University
Ruslan Mitkov, University of Wolverhampton
Hermann Ney, RWTH Aachen
Franz Och, Information Sciences Institute
Kemal Oflazer, Sabanci University
Kishore Papineni, IBM
Jessie Pinkham, Microsoft Research
Andrei Popescu-Belis, ISSCO/TIM/ETI University of Geneva
Florence Reeder, MITRE
Philip Resnik, University of Maryland
Antonio Ribeiro, Joint Research Centre, Ispra, Italy
Michel Simard, University of Montreal
Harold Somers, University of Manchester Institute of Science and Technology
Arturo Trujillo, Canon Research Centre Europe
Jean Veronis, University of Provence
Clare Voss, Army Research Lab
Yorick Wilks, University of Sheffield



More information about the Corpora mailing list