Co-organised by ELSNET

Toulouse, Saturday 7th July 2001


At a workshop at ACL 2000 in Hong Kong dedicated to Infrastructures
for Global Collaboration there was an agreement between the main
professional organisations in NLP and Speech (ACL and ISCA), and
ELSNET, and the other meeting participants, that it would be useful to
aim at a broadly supported, joint repository or catalogue for
tools and materials for the language and speech communities.

An ELSNET-sponsored workshop on educational issues held at EACL99
concluded that certain non-transient infrastructures needed to be
instigated to raise the public perception of educational issues in
NLP. It also concluded that a repository of shared materials,
appropriately indexed for educational usage, would be a useful point
of departure.

This workshop will build on the consensus reached at these previous
workshops. There will be two clear foci: one upon instruments for
sharing tools and resources in general that addresses practical
problems, and the other upon the technological and infrastructural
issues surrounding the educational uses of repositories.

Good examples of existing initiatives in this area are among others
the ACL Natural Language Software Registry (hosted at DFKI, which was set up as a repository for tools
for the distinct fields of Human Language Techology (HLT), the
ELRA/ELDA, LDC, TELRI and Elsnet resources catalogues and repositories
(,, and, OLAC (a
worldwide network of language archives at,
JEWELS (, an as-yet incomplete EU
funded website for educational materials in Language and Speech.

A third theme concerns how to build upon existing initiatives as
sources of data or inspiration.


The main goal of the workshop is to discuss methods for the
improvement and extension of existing repositories; the educational
uses of repositories; the closer interlinking between different kinds
of repositories (tools and resources); global infrastructures for the
achievement of joint actions.  However, we expect the scope of the
workshop to be much wider than that, as the issues addressed are of
general interest to everybody who believes that sharing tools and
resources is essential for the progress of research and education in
our field.

Contributions of papers and demonstrations are solicited that address
the above themes.  The following list of topics is suggestive rather
than exhaustive:

* Repositories versus catalogues

* Mechanisms and infrastructures for sharing and describing content
* Repository management
* Standards for exchange, description, and annotation
* Metadata descriptions
* Quality assessment
* Structure and content of an NLP/CL repository
* Tools and materials for NLP/CL education
* Web-based teaching methods for NLP/CL


* Electronic submissions only (PostScript, Word, or PDF), following the
  appropriate ACL latex style or Microsoft Word style. Submissions
  should not exceed eight (8) pages, including references. You can
  download the appropriate style or template files using the following
  link: In case of problems with the
  submission format, please contact one of the co-chairs.

* Submissions to either co-chair (mros at and
  declerck at All submissions will be acknowledged.

* Please provide a list of keywords in the separate header page and
  indicate the best fitting subtopic(s) from the above list.


* Demos may be submitted with or without an accompanying

* Please write a 2-page description of the demo and send
  to either co-chair. Please let us know about special hardware
  requirements over and above the standard PC + beamer without
  internet access provided by default


- Thierry Declerck (DFKI) Co-chair (Repository) declerck at
- Mike Rosner (Malta) Co-chair (Education) mros at

- Steven Bird (U. Penn)
- Bill Black (UMIST) (UMIST, Manchester, UK) wjb at
- Gosse Bouma gosse at (University of Groningen)
- Koenraad de Smedt desmedt at (University of Bergen)
- Claire Gardent (CNRS, Nancy) claire.gardent at
- Steven Krauwer (Utrecht University) steven.krauwer at
- Donna Harman (NIST)
- Julia Hirschberg (ATT, ISCA)
- Jun'ichi Tsujii (Tokyo)
- Andy Way (Dublin City University) away at


  * Submission Deadline:    6th April 2001
  * Notification Date:      27th April 2001
  * Camera ready copy due:  16th May 2001



Michael Rosner
email: mros at

Thierry Duclerc
email: declerck at


        Call for papers
        Workshop on Data-driven MT
        ACL'2001 Conference
        Toulouse, France
        Invited speaker:  Hermann Ney, RWTH Aachen
        Deadline for paper submissions:                 April 6, 2001
        Deadline for notification of paper acceptance:  April 27, 2001
        Deadline for camera-ready papers:                May 16, 2001

        Workshop Date:                                  July 7, 2001

        Details on submissions listed below.

        With the increased availability of online corpora, data-driven
 approaches have become central to the NL community.  A variety of
 data-driven approaches have been used to help build Machine Translation
 systems -- example-based, statistical  MT, and other machine learning
 approaches - and there are all sorts of possibilities for hybrid systems.
 We wish to bring together proponents of as many techniques as possible to
 engage in a discussion of which combinations will yield maximal success in

        We propose to center the workshop on Data Driven MT, by which we
 mean all approaches which develop algorithms and programs to exploit data
 in the development of MT, primarily the use of large bilingual corpora
 created by human translators, and serving as a source of training data for
 MT systems. We are specifically interested in papers about

                *       statistical machine translation (modeling, training,
                *       machine-learning in translation
                *       example-based machine translation
                *       acquisition of multilingual training data
                *       evaluation of data driven methods (also with
 rule-based methods)
                *       combination of various translation systems;
 integration of classical rule-based and data driven approaches
                *       word/sentence alignment  methods

        An especially important question that we wish to address is which
 techniques are best for each of the subparts of a complete MT system -
 e.g. learning grammars, building lexicons, parsing  input data,
 determining transfer principles, generating target text, etc.

        We will strongly encourage papers on  systems which show
 demonstrable progress over previously chosen methods, and which have been
 integrated in an actual end-to-end system. Test results or demos will be
 given strongest preference for participation.

        Jessie Pinkham, Microsoft Research jessiep at
 <mailto:jessiep at

        Kevin Knight  USC/ISI; knight at <mailto:knight at
        Web page

        Franz Josef Och, RWTH Aachen; och at

        Electronic submissions only; send the postscript or pdf form of your
 submission to: Deborah Coughlin   deborahc at .

        Submissions should follow the two-column format of ACL proceedings
 and should not exceed eight (8) pages, including references. We
        strongly recommend the use of ACL LaTeX style files or Microsoft
 Word Style files tailored for this year's conference. They are
        available from the ACL-2001 program committee Web-site at
 < .

        As reviewing will be blind, a separate identification page must be
 sent by email. The identification page should include the paper title,
        authors' names, affiliations, and email addresses, up to 5 keywords
 specifying the subject area, and a short summary (up to 5 lines).
        The paper should not include the authors' names and affiliations.

