Corpora: ACL-2001 Workshop on Data-Driven Machine Translation CFP

Priscilla Rasmussen rasmusse at
Mon Mar 19 21:32:23 UTC 2001


                              7 July  2001
                            Toulouse, France


With the increased availability of online corpora, data-driven
approaches have become central to the NL community.  A variety of
data-driven approaches have been used to help build Machine
Translation systems -- example-based, statistical MT, and other
machine learning approaches -- and there are all sorts of
possibilities for hybrid systems. We wish to bring together proponents
of as many techniques as possible to engage in a discussion of which
combinations will yield maximal success in translation.

We propose to center the workshop on Data Driven MT, by which we mean
all approaches which develop algorithms and programs to exploit data
in the development of MT, primarily the use of large bilingual corpora
created by human translators, and serving as a source of training data
for MT systems. The workshop will focus on the following topics:

- statistical machine translation (modeling, training, search)
- machine-learning in translation
- example-based machine translation
- acquisition of multilingual training data
- evaluation of data driven methods (also with rule-based methods)
- combination of various translation systems; integration of classical
  rule-based and data driven approaches
- word/sentence alignment

An especially important question that we wish to address is which
techniques are best for each of the subparts of a complete MT system -
e.g. learning grammars, building lexicons, parsing input data,
determining transfer principles, generating target text, etc.


         Jessie Pinkham, Microsoft Research jessiep at
         Kevin Knight, USC/ISI, knight at
         Franz Josef Och, RWTH Aachen, och at


        Hermann Ney, RWTH Aachen


          Srinivas Bangalore, AT&T Research
          Ralf Brown, CMU
          Francisco Casacuberta, Polytechnic Univ. of Valencia
          Eugene Charniak, Brown University
          Ulf Hermjakob, USC/ISI
          Pierre Isabelle, Xerox Research Centre Europe
          Bob Moore, MSR
          Masaaki Nagata, NTT
          Norbert Reithinger, DFKI
          Philip Resnik, Univ. of Maryland
          Eiichiro Sumita, ATR
          Koichi Takeda, IBM Japan
          Enrique Vidal, Polytechnic Univ. of Valencia
          Stephan Vogel, Univ. of Kaiserslautern
          Hideo Watanabe, IBM TRL


Papers describing original work in the area of Data Driven Machine
Translation should be submitted electronically in Postscript or PDF
format to:

       Deborah Coughlin,  mailto:deborahc at

Submissions should follow the two-column format of ACL proceedings and
should not exceed eight (8) pages, including references. We strongly
recommend the use of ACL LaTeX style files or Microsoft Word Style
files tailored for this year's conference. They are available from the
ACL-2001 program committee Web-site at:

The paper should not include the authors' names and affiliations.
As reviewing will be blind, the submission must be associated with an
email containing the following information (ASCII text):

      TITLE: title of the paper
      AUTHORS: list of authors
      EMAIL: email of author for correspondence
      KEYWORDS: keywords, topic sub-areas, ...
      ABSTRACT: abstract of the paper


          Paper submissions             6 April 2001
          Notification of acceptance    27 April 2001
          Camera-ready copies due       16 May 2001
          Workshop dates                7 July 2001


The registration fee for the workshop will be posted at a later stage.
The registration fee includes attendance of the workshop and a copy of
workshop proceedings. Follow the registration instructions at the ACL
site and indicate that you would like to attend the Data-Driven MT


More information about the Corpora mailing list