Appel: ACL Workshops
Philippe Blache
pb at harar.lpl.univ-aix.fr
Tue Mar 9 14:00:54 UTC 1999
From: Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Below are 1) a new ACL'99 workshop announcement on Unsupervised Learning
in NLP, and 2) a slightly revised announcement for the joint EMNLP and
WVLC ACL'99 workshop. These are separated by asterisks (*).
----------------------------------------------------------------------
ACL-99 Workshop
Unsupervised Learning in Natural Language Processing
University of Maryland, College Park, MD, USA
June 21st, 1999
http://www.ai.sri.com/~kehler/unsup-acl-99.html
Endorsed by the Association for Computational Linguistics (ACL)
Special Interest Group on Natural Language Learning (SIGNLL)
WORKSHOP DESCRIPTION
Many of the successes achieved from using learning techniques in
natural language processing (NLP) have utilized the supervised
paradigm, in which models are trained from data annotated with the
target concepts to be learned. For instance, the target concepts in
language modeling for speech recognition are words, and thus raw text
corpora suffice. The first successful part-of-speech taggers were
made possible by the existence of the Brown corpus (Francis, 1964), a
million-word data set which was laboriously hand-tagged a quarter of a
century prior. Finally, progress in statistical parsing required the
development of the Penn Treebank data set (Marcus et al. 1993), the
result of many staff years of effort. While it is worthwhile to
utilize annotated data when it is available, the future success of
learning for natural language systems cannot depend on a paradigm
requiring that large, annotated data sets be created for each new
problem or application. The costs of annotation are prohibitively
time and expertise intensive, and the resulting corpora are too
susceptible to restriction to a particular domain, application, or
genre.
Thus, long-term progress in NLP is likely to be dependent on the use
of unsupervised and weakly supervised learning techniques, which do
not require large annotated data sets. Unsupervised learning utilizes
raw, unannotated data to discover underlying structure giving rise to
emergent patterns and principles. Weakly supervised learning uses
supervised learning on small, annotated data sets to seed unsupervised
learning using much larger, unannotated data sets. Because these
techniques are capable of identifying new and unanticipated
correlations in data, they have the additional advantage of being able
to feed new insights back into more traditional lines of basic
research.
Unsupervised and weakly supervised methods have been used successfully
in several areas of NLP, including acquiring verb subcategorization
frames (Brent, 1993; Manning, 1993), part-of-speech tagging (Brill,
1997), word sense disambiguation (Yarowsky, 1995), and prepositional
phrase attachment (Ratnaparkhi, 1998). The goal of this workshop is
to discuss, promote, and present new research results (positive and
negative) in the use of such methods in NLP. We encourage submissions
on work applying learning to any area of language interpretation or
production in which the training data does not come fully annotated
with the target concepts to be learned, including:
* Fully unsupervised algorithms
* `Weakly supervised' learning, bootstrapping models from small sets
of annotated data
* `Indirectly supervised' learning, in which end-to-end task
evaluation drives learning in an embedded language interpretation
module
* Exploratory data analysis techniques applied to linguistic data
* Unsupervised adaptation of existing models in changing environments
* Quantitative and qualitative comparisons of results obtained with
supervised and unsupervised learning approaches
Position papers on the pros and cons of supervised vs. unsupervised
learning will also be considered.
FORMAT FOR SUBMISSION
Paper submissions can take the form of extended abstracts or full
papers, not to exceed six (6) pages. Authors of extended abstracts
should note the short timespan between notification of acceptance and
the final paper deadline. Up to two more pages may be allocated for
the final paper depending on space constraints.
Authors are requested to submit one electronic version of their papers
*or* four hardcopies. Please submit hardcopies only if electronic
submission is impossible. Submissions in Postscript or PDF format are
strongly preferred.
If possible, please conform with the traditional two-column ACL
Proceedings format. Style files can be downloaded from
ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/.
Email submissions should be sent to: kehler at ai.sri.com
Hard copy submissions should be sent to:
Andrew Kehler
SRI International
333 Ravenswood Avenue
EK272
Menlo Park, CA 94025
TIMETABLE
Paper submission deadline: March 26
Notification of acceptance: April 16
Camera ready papers due: April 30
ORGANIZERS
Andrew Kehler (SRI International)
Andreas Stolcke (SRI International)
PROGRAM COMMITTEE
Michael Brent (Johns Hopkins University)
Eric Brill (Johns Hopkins University)
Eugene Charniak (Brown University)
Michael Collins (AT&T Laboratories)
Moises Goldszmidt (SRI International)
Andrew Kehler (SRI International)
Andrew McCallum (Carnegie-Mellon University and Just Research)
Ray Mooney (University of Texas, Austin)
Srini Narayanan (ICSI, Berkeley)
Fernando Pereira (AT&T Laboratories)
David Powers (Flinders University of South Australia)
Adwait Ratnaparkhi (IBM Research)
Dan Roth (University of Illinois at Urbana-Champaign)
Andreas Stolcke (SRI International)
Dekai Wu (Hong Kong University of Science and Technology)
David Yarowsky (Johns Hopkins University)
***************************************************************************
Second Call For Papers
(EMNLP/VLC-99) JOINT SIGDAT CONFERENCE ON
EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND
VERY LARGE CORPORA
Sponsored by SIGDAT (ACL's Special Interest Group for Linguistic
Data and Corpus-based Approaches to NLP)
June 21-22, 1999
University of Maryland
In conjunction with
ACL'99: the 37th Annual Meeting of the
Association for Computational Linguistics
This SIGDAT-sponsored joint conference will continue to provide a forum
for new research in corpus-based and/or empirical methods in NLP. In
additionto providing a general forum, the theme for this year is
"Corpus-based and/or Empirical Methods in NLP for Speech, MT, IR, and
other Applied Systems"
A large number of systems in automatic speech recognition(ASR) and
synthesis, machine translation(MT), information retrieval(IR), optical
character recognition(OCR) and handwriting recognition have become
commercially available in the last decade. Many of these systems use
NLP technologies as an important component. Corpus-based and
empirical methods in NLP have been a major trend in recent years. How
useful are these techniques when applied to real systems, especially when
compared to rule-based methods? Are there any new techniques to be
developed in EMNLP and from VLC in order to improve the state-of-the-art
of ASR, MT, IR, OCR, and other applied systems? Are there new ways to
combine corpus-based and empirical methods with rule-based systems?
This two-day conference aims to bring together academic researchers and
industrial practitioners to discuss the above issues, through technical
paper sessions, invited talks, and panel discussions. The goal of the
conference is to raise an awareness of what kind of new EMNLP techniques
need to be developed in order to bring about the next breakthrough in
speech recognition and synthesis, machine translation, information
retrieval and other applied systems.
Scope
The conference solicits paper submissions in (and not limited to) the
following areas:
1) Original work in one of the following technologies and its relevance
to speech, MT, or IR:
(a) word sense disambiguation
(b) word and term segmentation and extraction
(c) alignment
(d) bilingual lexicon extraction
(e) POS tagging
(f) statistical parsing
(g) dialog models
(h) others (please specify)
2) Proposals of new EMNLP technologies for speech, MT, IR, OCR, or other
applied systems (please specify).
3) Comparetive evaluation of the performance of EMNLP technologies in
one of the areas in (1) and that of its rule-based or knowledge-based
counterpart in a speech, MT, IR, OCR or other applied system.
Submission Requirements
Submissions should be limited to original, evaluated work. All papers
should include background survey and/or reference to previous work. The
authors should provide explicit explanation when there is no evaluation in
their work. We encourage paper submissions related to the conference theme.
In particular, we encourage the authors to include in their papers,
proposals and discussions of the relevance of their work to the
theme. However, there will be a special session in the conference to
include corpus-based and/or empirical work in all areas of natural language
processing.
Submission Format
Only hard-copy submissions will be accepted. Reviewing of papers will
not be blind. The submission format and word limit are the same as those
for ACL this year. We strongly recommend the use of ACL-standard LaTeX
(plus bibstyle and trivial example) or Word style files for the
preparation of submissions. Six opies of full-length paper (not to exceed
3200 words exclusive of references) should be received at the following
address before or on March 31, 1999.
EMNLP/VLC-99 Program Committee
c/o Pascale Fung
Department of Electrical and Electronic Engineering
University of Science and Tehnology (HKUST)
Clear Water Bay, Kowloon
Hong Kong
Important Dates
March 31 Submission of full-length paper
April 30 Acceptance notice
May 20 Camera-ready paper due
June 21-22 Conference date
Program Chair
Pascale Fung
Human Language Technology Center
Department of Electrical and Electronic Engineering
University of Science and Tehnology (HKUST)
Clear Water Bay, Kowloon
Hong Kong
Tel: (+852) 2358 8537
Fax: (+852) 2358 1485
Email: pascale at ee.ust.hk
Program Co-Chair
Joe Zhou
LEXIS-NEXIS, a Division of Reed Elsevier
9555 Springboro Pike
Dayton, OH 45342
USA
Email: joez at lexis-nexis.com
Program Committee (partial list)
Jiang-Shin Chang (Behavior Design Corp.)
Ken Church (AT&T Labs--Research)
Ido Dagan (Bar-Ilan University)
Marti Hearst (UC-Berkeley)
Huang, Changning (Tsinghua University)
Pierre Isabelle (Xerox Research Europe)
Lillian Lee (Cornell University)
David Lewis (AT&T Research)
Dan Melamed (West Law Research)
Masaaki Nagata (NTT)
Steve Richardson (Microsoft Research)
Richard Sproat (AT&T Labs--Research)
Andreas Stolcke (SRI)
Ralph Weischedel (BBN)
Dekai Wu (Hong Kong University of Science & Technology)
David Yarowsky (Johns Hopkins University)
More information about the Ln
mailing list