Last Call: COLING-2000 Workshop on Toolsets and Architectures

Sun May 14 17:42:58 UTC 2000

Call for Papers for the

COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

Centre Universitaire, Luxembourg, 5 August 2000

(see also this call at  http://crl.nmsu.edu/Events/COLING00)

Background

The purpose of the workshop is to present the state-of-the-art on NLP
toolsets and workbenches that can be used to develop multilingual
and/or multi-applications NLP components and systems. Although
technical presentations of particular toolsets are of interest, we
would like to emphasize methodologies and practical experiences in
building components or full applications using an NLP
toolset. Combined demonstrations and paper presentations are strongly
encouraged.

Many toolsets have been developed to support the implementation of
single NLP components (taggers, parsers, generators, dictionaries) or
complete Natural Language Processing applications (Information
Extraction systems, Machine Translation systems).  These tools aim at
facilitating and lowering the cost of building NLP systems. Since the
tools themselves are often complex pieces of software, they require a
significant amount of effort to be developed and maintained in the
first place. Is this effort worth the trouble?  It is to be noted that
NLP toolsets have often been originally developed for implementing a
single component or application. In this case, why not build the NLP
system using a general programming language such as Lisp or Prolog?
There can be at least two answers. First, for pure efficiency issues
(speed and space), it is often preferable to build a parameterized
algorithm operating on a uniform data structure (e.g., a
phrase-structure parser). Second, it is harder, and often impossible,
to develop, debug and maintain a large NLP system directly written in
a general programming language.

It has been the experience of many users that a given toolset is quite
often unusable outside its environment: the toolset can be too
restricted in its purpose (e.g. an MT toolset that cannot be used for
building a grammar checker), too complex to use, or even too difficult
to install. There have been, in particular in the US under the Tipster
program, efforts to promote instead common architectures for a given
set of applications (primarily IR and IE in Tipster; see also the
Galaxy architecture of the DARPA Communicator project). Several
software environments have been built around this flexible concept,
which is closer to current trends in main stream software engineering.

The workshop aims at providing a picture of the current problems faced
by developers and users of toolsets, and future directions for the
development and use of NLP toolsets. We encourage reports of actual
experiences in the use of toolsets (complexity, training, learning
curve, cost, benefits, user profiles) as well as presentation of
toolsets concentrating on user issues (GUIs, methodologies, on-line
help, etc.)  and application development. Demonstrations are also
welcome.

Audience

Researchers and practitioners in Language Engineering, users and
developers of tools and toolsets.

Issues

Although individual tools (such as a POS taggers) have their use, they
typically need to be integrated in a complete application (e.g. an IR
system). Language Engineering issues in toolset and architectures
include (in no particular order):

  Practical experience in the use of a toolset;
  Methodological issues associated to the use of a toolset;
  Benefits and deficiencies of toolsets;
  User (linguist/programmer) training and support;
  Adaptation of a tool (or toolset) to a new kind of application;
  Adaptation of a tool to a new language;
  Integration of a tool in an application;
  Architectures and support software;
  Reuse of data resources vs. processing components;
  NLP algorithmic libraries.

Format of the Workshop

The one-day workshop will include twelve presentation periods which
will be divided into 20 minutes presentations followed by 10 minutes
reserved for exchanges. We encourage the authors to focus on the
salient points of their presentation and identify possible
controversial positions.  There will be ample time set aside for
informal and panel discussions and audience participation. Please note
that workshop participants are required to register at
http://www.coling.org/reg.html.

Deadlines

   21 May 2000: Submission deadline.
   11 June 2000: Notification to authors.
   24 June 2000: Final camera-ready copy.
   5 August 2000: COLING-2000 Workshop.

Submission Format

Send submissions of no more than 6 pages conforming to the COLING
format to zajac at crl.nmsu.edu. We prefer electronic submissions using
either PDF or Postscript. Final submissions can extend to 10 pages.

Organizing Committee

  Rémi Zajac (Chair), CRL, New-Mexico State University, USA:
       zajac at crl.nmsu.edu.
  Jan Amtrup, CRL, New-Mexico State University, USA:
      jamtrup at crl.nmsu.edu.
  Stephan Busemann, DFKI, Saarbrucken:
       busemann at dfki.de.
  Hamish Cunningham, University of Sheffield:
      hamish at dcs.shef.ac.uk.
  Guenther Goerz, IMMD VIII, University of Erlangen:
      goerz at immd8.informatik.uni-erlangen.de.
  Gertjan van Noord, University of Groningen:
      vannoord at let.rug.nl.
  Fabio Pianesi, IRST, Trento:
      pianesi at irst.itc.it.

Of Related Interest

  The Natural Language Software Registry at
      http://www.dfki.de/lt/registry/sections.html
  The Coling-200 Web Site at http://www.coling.org/

---