13.2484, FYI: American National Corpus, New Corpora from LDC

Tue Oct 1 01:46:28 UTC 2002

LINGUIST List:  Vol-13-2484. Mon Sep 30 2002. ISSN: 1068-4875.

Subject: 13.2484, FYI: American National Corpus, New Corpora from LDC

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Consulting Editor:
        Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, Arizona U.
	James Yuells, EMU		Marie Klopfenstein, WSU
	Michael Appleby, EMU		Heather Taylor, EMU
	Ljuba Veselinova, Stockholm U.	Richard John Harvey, EMU
	Dina Kapetangianni, EMU		Renee Galvis, WSU
	Karolina Owczarzak, EMU		Anita Huang, EMU
	Tomoko Okuno, EMU		Steve Moran, EMU
	Lakshmi Narayanan, EMU		Sarah Murray, WSU
	Marisa Ferrara, EMU

Software: Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
          Zhenwei Chen, E. Michigan U. <chen at linguistlist.org>
	  Prashant Nagaraja, E. Michigan U. <prashant at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: James Yuells <james at linguistlist.org>

=================================Directory=================================

1)
Date:  Fri, 27 Sep 2002 12:57:41 -0400
From:  Nancy Ide <ide at cs.vassar.edu>
Subject:  ACL papers in the American National Corpus

2)
Date:  Mon, 30 Sep 2002 13:28:31 -0400
From:  LDC Office <ldc at ldc.upenn.edu>
Subject:  New Corpora from the LDC

-------------------------------- Message 1 -------------------------------

Date:  Fri, 27 Sep 2002 12:57:41 -0400
From:  Nancy Ide <ide at cs.vassar.edu>
Subject:  ACL papers in the American National Corpus

The American National Corpus Consortium, with permission from the
Association for Computational Linguistics, will include in the American
National Corpus a selection of recent papers written by American authors
and published in ACL proceedings and anthologies. Any authors who object
to having their papers included in the American National Corpus should
contact Nancy Ide (ide at cs.vassar.edu) to have their papers removed.

Note that this applies to papers whose authors are native speakers of
American English only.

=======================================================

Nancy Ide

Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 845 437-5988 Fax: +1 845 437-7498
ide at cs.vassar.edu

Chercheur Associe
Equipe Langue et Dialogue, LORIA/CNRS
Campus Scientifique - BP 239
54506 Vandoeuvre-les-Nancy FRANCE
Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
ide at loria.fr

=======================================================

-------------------------------- Message 2 -------------------------------

Date:  Mon, 30 Sep 2002 13:28:31 -0400
From:  LDC Office <ldc at ldc.upenn.edu>
Subject:  New Corpora from the LDC

	     *   ACQUAINT English News Text   *

    *   2001 NIST Speaker Recognition Evaluation   *

The Linguistic Data Consortium (LDC) is pleased to announce the
availability of two new corpora.

			   *

The ACQUAINT English News Text corpus consists of English newswire text,
drawn from three sources: the Xinhua News Service (People's Republic of
China), the New York Times News Service, and the Associated Press
Worldstream News Service. It was prepared by the LDC for the AQUAINT
Project, and will be used in official benchmark evaluations conducted by
National Institute of Standards and Technology (NIST).

This two disc publication contains roughly 375 million words correlating
to about 3 GB of data. The text data are separated into directories by
source (apw, nyt, xie); within each source, data files are subdivided by
year, and within each year, there is one file per date of collection.

For further information, please visit:

http://www.ldc.upenn.edu/Catalog/LDC2002T31.html

Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $1000.

			   *

The 2001 NIST Speaker Recognition Evaluation is part of an ongoing
series of yearly evaluations conducted by NIST. These evaluations
provide an important contribution to the direction of research efforts
and the calibration of technical capabilities. They are intended to be
of interest to all researchers working on the general problem of text
independent speaker recognition.

The single CD-ROM 2001 NIST Speaker Recognition Evaluation corpus is
based entirely on conversational cellular telephone speech collected by
the LDC.  The files are divided into evaluation and development data.
There are a total of 2,350 compressed speech files, all of which are
in SPHERE format.

For further information, including a link to the 2001 NIST Speaker
Recognition Evaluation website, please visit:

http://www.ldc.upenn.edu/Catalog/LDC2002S34.html

Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $400.

			   *

If you need additional information before placing your order, or
would like to inquire about membership in the LDC, please send email to
<ldc at ldc.upenn.edu> or call (215) 573-1275.

- ------------------------------------------------------------------
Linguistic Data Consortium          Phone: (215) 573-1275
3600 Market Street                  Fax:   (215) 573-2175
Suite 810                           email: ldc at ldc.upenn.edu
Philadelphia, PA 19104-2653         www: http://www.ldc.upenn.edu

---------------------------------------------------------------------------
LINGUIST List: Vol-13-2484