[Corpora-List] New Corpora from the LDC

LDC Office ldc at ldc.upenn.edu
Mon Sep 30 17:19:35 UTC 2002


	     *   ACQUAINT English News Text   *

    *   2001 NIST Speaker Recognition Evaluation   *


The Linguistic Data Consortium (LDC) is pleased to announce the
availability of two new corpora.

			   *

The ACQUAINT English News Text corpus consists of English newswire text,
drawn from three sources: the Xinhua News Service (People's Republic of
China), the New York Times News Service, and the Associated Press
Worldstream News Service. It was prepared by the LDC for the AQUAINT
Project, and will be used in official benchmark evaluations conducted by
National Institute of Standards and Technology (NIST).

This two disc publication contains roughly 375 million words correlating
to about 3 GB of data. The text data are separated into directories by
source (apw, nyt, xie); within each source, data files are subdivided by
year, and within each year, there is one file per date of collection.

For further information, please visit:

http://www.ldc.upenn.edu/Catalog/LDC2002T31.html

Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $1000.

			   *

The 2001 NIST Speaker Recognition Evaluation is part of an ongoing
series of yearly evaluations conducted by NIST. These evaluations
provide an important contribution to the direction of research efforts
and the calibration of technical capabilities. They are intended to be
of interest to all researchers working on the general problem of text
independent speaker recognition.

The single CD-ROM 2001 NIST Speaker Recognition Evaluation corpus is
based entirely on conversational cellular telephone speech collected by
the LDC.  The files are divided into evaluation and development data.
There are a total of 2,350 compressed speech files, all of which are
in SPHERE format.

For further information, including a link to the 2001 NIST Speaker
Recognition Evaluation website, please visit:

http://www.ldc.upenn.edu/Catalog/LDC2002S34.html

Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $400.

			   *

If you need additional information before placing your order, or
would like to inquire about membership in the LDC, please send email to
<ldc at ldc.upenn.edu> or call (215) 573-1275.

	
--------------------------------------------------------------------
Linguistic Data Consortium          Phone: (215) 573-1275
3600 Market Street                  Fax:   (215) 573-2175
Suite 810                           email: ldc at ldc.upenn.edu
Philadelphia, PA 19104-2653         www: http://www.ldc.upenn.edu



More information about the Corpora mailing list