14.602, Calls: Text Summarization/Chinese Lang Processing

LINGUIST List linguist at linguistlist.org
Sun Mar 2 15:51:51 UTC 2003

LINGUIST List:  Vol-14-602. Sun Mar 2 2003. ISSN: 1068-4875.

Subject: 14.602, Calls: Text Summarization/Chinese Lang Processing

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>

As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.

To post to LINGUIST, use our convenient web form at


Date:  Wed, 26 Feb 2003 17:14:26 EST
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  Text Summarization Workshop & Document Understanding Conference

Date:  Wed, 26 Feb 2003 17:27:01 EST
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  Second Sighan Workshop on Chinese Language Processing

-------------------------------- Message 1 -------------------------------

Date:  Wed, 26 Feb 2003 17:14:26 EST
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  Text Summarization Workshop & Document Understanding Conference


                            HLT-NAACL Text
                        Summarization Workshop
             Document Understanding Conference (DUC 2003)

                       May 31 and June 1, 2003
                         Edmonton, AB, Canada


Given that the ACL'03 deadline is tomorrow and that most other
HLT-NAACL'03 workshop deadlines are not until early March, the
submission deadline for the HLT-NAACL'03 has been extended by a week
to March 7.


- March  7, 2003 - submissions due
- March 28, 2003 - authors notified
- April 10, 2003 - camera-ready papers due

Please visit the workshop site for submissions details and additional

The co-chairs,
Dragomir Radev (radev at umich.edu)
Simone Teufel (Simone.Teufel at cl.cam.ac.uk)

-------------------------------- Message 2 -------------------------------

Date:  Wed, 26 Feb 2003 17:27:01 EST
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  Second Sighan Workshop on Chinese Language Processing

Second Sighan Workshop on Chinese Language Processing
July 11-12, 2003

Sapporo Convention Center
Sapporo, Japan

               Second CALL FOR WORKSHOP PAPERS

 The SIGHAN, a Special Interest Group of the Association for
Computational Linguistics, invites the submission of papers for its
second workshop to be held in conjunction with ACL-03 in Sapporo,
Japan. Papers are invited on substantial, original, and unpublished
research on all aspects of Chinese language processing, including, but
not limited to:
  word segmentation, POS tagging, and parsing;
  discourse, dialogue, and natural language interfaces;
  lexical semantics, word sense disambiguation, and lexicon acquisition;
  generation and summarization;
  cross-lingual information retrieval and machine translation.

 Papers should describe original work; they should emphasize completed
work rather than intended work, and should indicate clearly the state
of completion of the reported results. Wherever appropriate, concrete
evaluation results should be included.

 The reviewing of the papers will be blind. Submissions will be judged
on correctness, originality, technical strength, significance and
relevance to the workshop, and interest to the attendees. A paper
accepted for presentation at this workshop cannot be presented or have
been presented at any other meeting with publicly available published

 We allow simultaneous paper submission to the workshop and the ACL
main conference. If a paper is accepted by both the conference and the
workshop, the paper will be presented at the conference, rather than
at the workshop. The author(s) should notify the workshop chairs by
May 1 so that proper arrangement can be made.

 Submissions should follow the same style as the ones for regular ACL
paper. For details about formatting, go to
http://www.ec-inc.co.jp/ACL2003/ and click on "Call for
Papers". Submissions should not exceed 8 pages including the

 Submissions should be done online by going to the website
http://www.sighan.org/swclp2/submit. In case that you have trouble
submitting your paper online, please email the pdf and/or postscript
version of the paper to qma at crl.go.jp AND feixia at us.ibm.com. Note that
the pdf/ps file should NOT include authors' names and affiliations as
the reviewing process will be blind.

  .Paper submission deadline: March 10, 2003
  .Notification of acceptance: April 20, 2003
  .camera-ready paper deadline: May 25, 2003
  .workshop: July 11-12, 2003


Please watch the web site http://www.sighan.org/swclp2 for
developments. You may also contact Qing Ma (qma at crl.go.jp) or Fei Xia
(feixia at us.ibm.com) with questions regarding the workshop.

For people who need visas to come to Japan, please go to ACL-03's
website (http://www.ec-inc.co.jp/ACL2003/) and click on "Applying for
visas to Japan" on the left side of the page for more information.


Qing Ma - Communications Research Lab, Japan (co-chair)
Fei Xia - IBM, USA (co-chair)

Joyce Chai - Michigan State Univ, USA
Keh-Jiann Chen - Academia Sinica, Taiwan
Zhendong Dong - Hownet designer, China
Tom Emerson - Basis Technology Corp, USA
Changning Huang - Microsoft, China
Chu-ren Huang - Academia Sinica, Taiwan
K.L.Kwok - Queens College, USA
Tom Lai - City Univ. of Hong Kong
Dekang Lin - Univ of Alberta, Canada
Kim-Teng Lua - National University of Singapore
Masaki Murata - Communications Research Laboratory, Japan
Martha Palmer - Univ. of Pennsylvania, USA
Shimei Pan - IBM, USA
Fuji Ren - Tokushima University, Japan
Bangalore Srinivas - ATT, USA
Keh-Yih Su -  Behavior Design Corporation, Taiwan
Maosong Sun - Tsinghua University, China
Bing Swen - Peking University, China
Tan Chew Lim - National University of Singapore
Banjamin Tsou - City Univ. of Hong Kong
Amy Weinberg - Univ of Maryland, USA
Andi Wu - Microsoft, USA
Dekai Wu  - Hong Kong Science and Technology University
Nianwen	Xue - Univ. of Pennsylvania, USA
Jin Yang - Systran, USA
Shiwen Yu - Peking University, China
Qiang Zhou -  Tsinghua University, China

- --------------------------------------------------------------------------



to be held as part of the Second Meeting of SIGHAN (the ACL Special
Interest Group on Chinese Language Processing), July 11-12, 2003 (in
conjunction with ACL 2003) in Sapporo, Japan.


There has been a large literature on the topic of segmenting Chinese
text into words, and many approaches have been proposed. However, one
problem has been that it is very difficult to compare the results of
different approaches, since researchers have not been testing their
systems on common test corpora. While it is recognized that there is
no single correct segmentation, and different applications may require
different segmentations, it is nonetheless desirable to be able to
compare different segmentation algorithms on common datasets so that
one can understand which algorithms are most promising, independent of
a particular application.

We aim to address this issue by inviting researchers who work on
Chinese word segmentation to put their systems to the test on a common
set of training and test corpora. The results of this competition will
be published and it is hoped that the results will provide fodder for
future work in this area.


Training and test corpora will come from four sources:

1) The Academia Sinica (Taiwan) treebank (Taiwan Big5 encoding).

2) The Beijing University Institute of Computational Linguistics
   Corpus (GB encoding).

3) The Penn Chinese treebank (GB encoding).

4) Hong Kong City University corpus (HK Big5 encoding).

Each of these corpora has been hand-segmented according to its own
standard.  Sizes of training and test corpora are to be determined and
will depend upon the amounts available from the four sources.

Participants will be able to elect to be tested on any or all of the
corpora, except that participants from the institutions providing the
corpora will not be allowed to test on their own corpus.

In addition to electing one or more corpora, participants will also be
able to participate in either or both of a Restricted Track or an
Unrestricted Track. For the Restricted Track, the participant will be
allowed to use ONLY the materials from the training corpus
corresponding to each elected test corpus. For the Unrestricted Track,
the participants may use any resources they choose, including
proprietary dictionaries; however, participants will be required, in
their summaries (see below), to provide documentation on which of
their segmentation decisions were based on material other than the
training corpus or what their systems inferred algorithmically from
the training corpus.

The training and testing materials will be made available according to
a strict schedule as outlined below. Specific instructions on the
format of the segmented test data will be provided, and these
instructions must be followed exactly.

After the results are reported back to the participants, the
participants will be asked to provide a two-page summary of their
system for inclusion in the SIGHAN Workshop proceedings.

The bakeoff instructions (in both English and Chinese) can be found at
http://www.sighan.org/bakeoff2003/bakeoff_instr.html  More details of
the process will be posted in due course to the web page listed at the
end of this message.


MARCH 15, 2003: Training materials and complete instructions available
                at the website (see below), along with information on
                and references to the various segmentation standards.

APRIL 22, 2003: Testing materials available at the website.

APRIL 25, 2003:	Segmented test materials due back to ftp site by
		 5:00 PM, U.S. Eastern Daylight Time. The format of the
 		 returned segmentations must adhere to the guidelines
		 that will be posted March 15.

MAY 10, 2003:   Bakeoff results announced privately to participants.

MAY 25, 2003:	Two-page system descriptions due.

JULY 11, 2003:  Full results published at the SIGHAN Workshop


Please watch the web site


for developments. You may also contact Richard Sproat
(rws at research.att.com) with questions regarding the contest.

LINGUIST List: Vol-14-602

More information about the Linguist mailing list