Appel: ACL Workshops

Philippe Blache pb at
Tue Mar 9 14:00:54 UTC 1999

From: Priscilla Rasmussen <rasmusse at>

Below are 1) a new ACL'99 workshop announcement on Unsupervised Learning
in NLP, and 2) a slightly revised announcement for the joint EMNLP and
WVLC ACL'99 workshop. These are separated by asterisks (*).

			ACL-99 Workshop
         Unsupervised Learning in Natural Language Processing

            University of Maryland, College Park, MD, USA
		       June 21st, 1999

   Endorsed by the Association for Computational Linguistics (ACL)
     Special Interest Group on Natural Language Learning (SIGNLL)


Many of the successes achieved from using learning techniques in
natural language processing (NLP) have utilized the supervised
paradigm, in which models are trained from data annotated with the
target concepts to be learned.  For instance, the target concepts in
language modeling for speech recognition are words, and thus raw text
corpora suffice.  The first successful part-of-speech taggers were
made possible by the existence of the Brown corpus (Francis, 1964), a
million-word data set which was laboriously hand-tagged a quarter of a
century prior.  Finally, progress in statistical parsing required the
development of the Penn Treebank data set (Marcus et al. 1993), the
result of many staff years of effort.  While it is worthwhile to
utilize annotated data when it is available, the future success of
learning for natural language systems cannot depend on a paradigm
requiring that large, annotated data sets be created for each new
problem or application.  The costs of annotation are prohibitively
time and expertise intensive, and the resulting corpora are too
susceptible to restriction to a particular domain, application, or

Thus, long-term progress in NLP is likely to be dependent on the use
of unsupervised and weakly supervised learning techniques, which do
not require large annotated data sets.  Unsupervised learning utilizes
raw, unannotated data to discover underlying structure giving rise to
emergent patterns and principles.  Weakly supervised learning uses
supervised learning on small, annotated data sets to seed unsupervised
learning using much larger, unannotated data sets.  Because these
techniques are capable of identifying new and unanticipated
correlations in data, they have the additional advantage of being able
to feed new insights back into more traditional lines of basic

Unsupervised and weakly supervised methods have been used successfully
in several areas of NLP, including acquiring verb subcategorization
frames (Brent, 1993; Manning, 1993), part-of-speech tagging (Brill,
1997), word sense disambiguation (Yarowsky, 1995), and prepositional
phrase attachment (Ratnaparkhi, 1998).  The goal of this workshop is
to discuss, promote, and present new research results (positive and
negative) in the use of such methods in NLP.  We encourage submissions
on work applying learning to any area of language interpretation or
production in which the training data does not come fully annotated
with the target concepts to be learned, including:

 * Fully unsupervised algorithms
 * `Weakly supervised' learning, bootstrapping models from small sets
    of annotated data
 * `Indirectly supervised' learning, in which end-to-end task
     evaluation drives learning in an embedded language interpretation
 * Exploratory data analysis techniques applied to linguistic data
 * Unsupervised adaptation of existing models in changing environments
 * Quantitative and qualitative comparisons of results obtained with
    supervised and unsupervised learning approaches

Position papers on the pros and cons of supervised vs. unsupervised
learning will also be considered.


Paper submissions can take the form of extended abstracts or full
papers, not to exceed six (6) pages.  Authors of extended abstracts
should note the short timespan between notification of acceptance and
the final paper deadline.  Up to two more pages may be allocated for
the final paper depending on space constraints.

Authors are requested to submit one electronic version of their papers
*or* four hardcopies. Please submit hardcopies only if electronic
submission is impossible.  Submissions in Postscript or PDF format are
strongly preferred.

If possible, please conform with the traditional two-column ACL
Proceedings format. Style files can be downloaded from

Email submissions should be sent to: kehler at

Hard copy submissions should be sent to:

  Andrew Kehler
  SRI International
  333 Ravenswood Avenue
  Menlo Park, CA 94025


Paper submission deadline: March 26
Notification of acceptance: April 16
Camera ready papers due: April 30


Andrew Kehler (SRI International)
Andreas Stolcke (SRI International)


Michael Brent (Johns Hopkins University)
Eric Brill (Johns Hopkins University)
Eugene Charniak (Brown University)
Michael Collins (AT&T Laboratories)
Moises Goldszmidt (SRI International)
Andrew Kehler (SRI International)
Andrew McCallum (Carnegie-Mellon University and Just Research)
Ray Mooney (University of Texas, Austin)
Srini Narayanan (ICSI, Berkeley)
Fernando Pereira (AT&T Laboratories)
David Powers (Flinders University of South Australia)
Adwait Ratnaparkhi (IBM Research)
Dan Roth (University of Illinois at Urbana-Champaign)
Andreas Stolcke (SRI International)
Dekai Wu (Hong Kong University of Science and Technology)
David Yarowsky (Johns Hopkins University)


			Second Call For Papers

                	VERY LARGE CORPORA

     Sponsored by SIGDAT (ACL's Special Interest Group for Linguistic
Data and Corpus-based Approaches to NLP)

                          June 21-22, 1999
                       University of Maryland

                           In conjunction with
                     ACL'99: the 37th Annual Meeting of the
                     Association for Computational Linguistics

This SIGDAT-sponsored joint conference will continue to provide a forum
for new research in corpus-based and/or empirical methods in NLP.  In
additionto providing a general forum, the theme for this year is
"Corpus-based and/or Empirical Methods in NLP for Speech, MT, IR, and
other Applied Systems"

A large number of systems in automatic speech recognition(ASR) and
synthesis, machine translation(MT), information retrieval(IR),  optical
character recognition(OCR) and handwriting recognition have become
commercially available in the last decade.  Many of these systems use
NLP technologies as an important component. Corpus-based and
empirical methods in NLP  have been a major trend in recent years. How
useful are these techniques when applied to real systems, especially when
compared to rule-based methods?  Are there any new techniques to be
developed in EMNLP and from VLC in order to improve the state-of-the-art
of ASR, MT, IR, OCR, and other applied systems?  Are there new ways to
combine corpus-based and empirical methods with rule-based systems?

This two-day conference aims to bring together academic researchers and
industrial practitioners to discuss the above issues, through technical
paper sessions, invited talks, and panel discussions. The goal of the
conference is to raise an awareness of what kind of new EMNLP techniques
need to be developed in order to bring about the next breakthrough in
speech recognition and synthesis, machine translation, information
retrieval and other applied systems.


The conference solicits paper submissions in (and not limited to) the
following areas:

1) Original work in one of the following technologies and its relevance
to speech, MT, or IR:
      (a) word sense disambiguation
      (b) word and term segmentation and extraction
      (c) alignment
      (d) bilingual lexicon extraction
      (e) POS tagging
      (f) statistical parsing
      (g) dialog models
      (h) others (please specify)

2) Proposals of new EMNLP technologies for speech, MT, IR, OCR, or other
applied systems (please specify).

3) Comparetive evaluation of the performance of EMNLP technologies in
one of the areas in (1) and that of its rule-based or  knowledge-based
counterpart in a speech, MT, IR, OCR or other applied system.

Submission Requirements

Submissions should be limited to original, evaluated work. All papers
should include background survey and/or reference to previous work.  The
authors should provide explicit explanation when there is no evaluation in
their work. We encourage paper submissions related to the conference theme.
In particular, we encourage the authors to include in their papers,
proposals and discussions of the relevance of their work to the
theme. However,  there will be a special session in the conference to
include corpus-based and/or empirical work in all areas of natural language

Submission Format

Only hard-copy submissions will be accepted. Reviewing of papers will
not be blind. The submission format and word limit are the same as those
for ACL this year. We strongly recommend the use of ACL-standard LaTeX
(plus bibstyle and trivial example) or Word style files for the
preparation of submissions. Six opies of full-length paper (not to exceed
3200 words exclusive of references) should be received at the following
address before or on March 31, 1999.

EMNLP/VLC-99 Program Committee
c/o Pascale Fung
Department of Electrical and Electronic Engineering
University of Science and Tehnology (HKUST)
Clear Water Bay, Kowloon
Hong Kong

Important Dates

March  31             Submission of full-length paper
April    30             Acceptance notice
May      20            Camera-ready paper due
June      21-22       Conference date

Program Chair

Pascale Fung
Human Language Technology Center
Department of Electrical and Electronic Engineering
University of Science and Tehnology (HKUST)
Clear Water Bay, Kowloon
Hong Kong
Tel:  (+852)  2358 8537
Fax: (+852)  2358 1485
Email: pascale at

Program Co-Chair
Joe Zhou
LEXIS-NEXIS, a Division of Reed Elsevier
9555 Springboro Pike
Dayton, OH 45342
Email: joez at

Program Committee (partial list)

Jiang-Shin Chang (Behavior Design Corp.)
Ken Church (AT&T Labs--Research)
Ido Dagan (Bar-Ilan University)
Marti Hearst (UC-Berkeley)
Huang, Changning (Tsinghua University)
Pierre Isabelle (Xerox Research Europe)
Lillian Lee (Cornell University)
David Lewis (AT&T Research)
Dan Melamed (West Law Research)
Masaaki Nagata (NTT)
Steve Richardson (Microsoft Research)
Richard Sproat (AT&T Labs--Research)
Andreas Stolcke (SRI)
Ralph Weischedel (BBN)
Dekai Wu (Hong Kong University of Science & Technology)
David Yarowsky (Johns Hopkins University)

More information about the Ln mailing list