15.1165, Calls: Computational Ling/Spain

Fri Apr 9 12:33:20 UTC 2004

LINGUIST List:  Vol-15-1165. Fri Apr 9 2004. ISSN: 1068-4875.

Subject: 15.1165, Calls: Computational Ling/Spain

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Andrea Berez <andrea at linguistlist.org>
 ==========================================================================
As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

=================================Directory=================================

1)
Date:  Fri, 9 Apr 2004 08:27:47 -0400 (EDT)
From:  rmalouf rmalouf at mail.sdsu.edu
Subject:  ACL04 Workshop: Tackling the Challenges of Terascale Human Language Problems

-------------------------------- Message 1 -------------------------------

Date:  Fri, 9 Apr 2004 08:27:47 -0400 (EDT)
From:  rmalouf rmalouf at mail.sdsu.edu
Subject:  ACL04 Workshop: Tackling the Challenges of Terascale Human Language Problems

ACL04 Workshop: Tackling the Challenges of Terascale Human Language
Problems

PLEASE NOTE THE CORRECTED DEADLINE!

Short Title: Terascale NLP 2004

Date: 26-Jul-2004 - 26-Jul-2004
Location: Barcelona, Spain
Contact: Rob Malouf
Contact Email: rmalouf at mail.sdsu.edu
Meeting URL: http://www-rohan.sdsu.edu/~malouf/terascale04.html

Linguistic Sub-field: Computational Linguistics

Call Deadline: 18-Apr-2004

Meeting Description:

Machine learning methods form the core of most modern speech and
language processing technologies. Techniques such as kernel methods,
log-linear models, and graphical models are routinely used to classify
examples (e.g., to identify the topic of a story), rank candidates (to
order a set of parses for some sentence) or assign labels to sequences
(to identify named entities in a sentence). While considerable success
has been achieved using these algorithms, what has become increasingly
clear is that the size and complexity of the problems---in terms of
number of training examples, the size of the feature space, and the
size of the prediction space---are growing at a much faster rate than
our computational resources are, Moore's Law notwithstanding. This
raises real questions as to whether our current crop of algorithms
will scale gracefully when processing such problems. This workshop
will bring researchers together who are interested in meeting the
challenges associated with scaling systems for natural language
processing. Machine learning methods form the core of most modern
speech and language processing technologies. Techniques such as kernel
methods, log-linear models, and graphical models are routinely used to
classify examples (e.g., to identify the topic of a story), rank
candidates (to order a set of parses for some sentence) or assign
labels to sequences (to identify named entities in a sentence). While
considerable success has been achieved using these algorithms, what
has become increasingly clear is that the size and complexity of the
problems---in terms of number of training examples, the size of the
feature space, and the size of the prediction space---are growing at a
much faster rate than our computational resources are, Moore's Law
notwithstanding. This raises real questions as to whether our current
crop of algorithms will scale gracefully when processing such
problems. For example, training Support Vector Machines for relatively
small-scale problems, such as classifying phones in the speech TIMIT
dataset, will take an estimated six years of CPU time (Salomon, et
al. 2002).  If we wished to move to a larger domain and harness, say,
all the speech data emerging from a typical call center, then very
clearly enormous computational resources would be needed to be devoted
to the task.

Allocation of such vast amounts of computational resources is beyond
the scope of most current research collaborations, which consist of
small groups of people working on isolated tasks using small networks
of commodity machines. The ability to deal with large-scale speech and
language problems requires a move away from isolated individual groups
of researchers towards co-ordinated `virtual organizations'.

The terascale problems that are now emerging demand an understanding
of how to manage people and resources possibly distributed over many
sites.  Evidence of the timely nature of this workshop can be seen at
this year's ''Text Retrieval Conference'' (TREC), which concluded with
the announcement of a new track next year which would be specifically
devoted to scaling information retrieval systems. This clearly
demonstrates the community need for scaling human language
technologies.

In order to address large scale speech and language problems that
arise in realistic tasks, we must address the issue of scalable
machine learning algorithms that can better exploit the structure of
such problems, their computational resource requirements and its
implications on how we carry out research as a community.

This workshop will bring researchers together who are interested in
meeting the challenges associated with scaling systems for natural
language processing.  Topics include (but are not limited to):

 + exactly scaling existing techniques

  + applying interesting approximations which drastically reduce the
    amount of required computation yet do not sacrifice much in the way
    of accuracy

  + using on-line learning algorithms to learn from streaming data sources

  + efficiently retraining models as more data becomes available

  + experience with using very large datasets, apply for example Grid
    computing strategies technologies

  + techniques for efficiently manipulating enormous volumes of data

  + human factors associated with managing large virtual organizations

  + adapting methods developed for dealing with large-scale problems
    in other computational sciences, such as physics and biology, to natural
    language processing

---------------------------------------------------------------------------
LINGUIST List: Vol-15-1165