13.2484, FYI: American National Corpus, New Corpora from LDC
LINGUIST List
linguist at linguistlist.org
Tue Oct 1 01:46:28 UTC 2002
LINGUIST List: Vol-13-2484. Mon Sep 30 2002. ISSN: 1068-4875.
Subject: 13.2484, FYI: American National Corpus, New Corpora from LDC
Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
Reviews (reviews at linguistlist.org):
Simin Karimi, U. of Arizona
Terence Langendoen, U. of Arizona
Consulting Editor:
Andrew Carnie, U. of Arizona <carnie at linguistlist.org>
Editors (linguist at linguistlist.org):
Karen Milligan, WSU Naomi Ogasawara, Arizona U.
James Yuells, EMU Marie Klopfenstein, WSU
Michael Appleby, EMU Heather Taylor, EMU
Ljuba Veselinova, Stockholm U. Richard John Harvey, EMU
Dina Kapetangianni, EMU Renee Galvis, WSU
Karolina Owczarzak, EMU Anita Huang, EMU
Tomoko Okuno, EMU Steve Moran, EMU
Lakshmi Narayanan, EMU Sarah Murray, WSU
Marisa Ferrara, EMU
Software: Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
Zhenwei Chen, E. Michigan U. <chen at linguistlist.org>
Prashant Nagaraja, E. Michigan U. <prashant at linguistlist.org>
Home Page: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
Editor for this issue: James Yuells <james at linguistlist.org>
=================================Directory=================================
1)
Date: Fri, 27 Sep 2002 12:57:41 -0400
From: Nancy Ide <ide at cs.vassar.edu>
Subject: ACL papers in the American National Corpus
2)
Date: Mon, 30 Sep 2002 13:28:31 -0400
From: LDC Office <ldc at ldc.upenn.edu>
Subject: New Corpora from the LDC
-------------------------------- Message 1 -------------------------------
Date: Fri, 27 Sep 2002 12:57:41 -0400
From: Nancy Ide <ide at cs.vassar.edu>
Subject: ACL papers in the American National Corpus
The American National Corpus Consortium, with permission from the
Association for Computational Linguistics, will include in the American
National Corpus a selection of recent papers written by American authors
and published in ACL proceedings and anthologies. Any authors who object
to having their papers included in the American National Corpus should
contact Nancy Ide (ide at cs.vassar.edu) to have their papers removed.
Note that this applies to papers whose authors are native speakers of
American English only.
=======================================================
Nancy Ide
Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 845 437-5988 Fax: +1 845 437-7498
ide at cs.vassar.edu
Chercheur Associe
Equipe Langue et Dialogue, LORIA/CNRS
Campus Scientifique - BP 239
54506 Vandoeuvre-les-Nancy FRANCE
Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
ide at loria.fr
=======================================================
-------------------------------- Message 2 -------------------------------
Date: Mon, 30 Sep 2002 13:28:31 -0400
From: LDC Office <ldc at ldc.upenn.edu>
Subject: New Corpora from the LDC
* ACQUAINT English News Text *
* 2001 NIST Speaker Recognition Evaluation *
The Linguistic Data Consortium (LDC) is pleased to announce the
availability of two new corpora.
*
The ACQUAINT English News Text corpus consists of English newswire text,
drawn from three sources: the Xinhua News Service (People's Republic of
China), the New York Times News Service, and the Associated Press
Worldstream News Service. It was prepared by the LDC for the AQUAINT
Project, and will be used in official benchmark evaluations conducted by
National Institute of Standards and Technology (NIST).
This two disc publication contains roughly 375 million words correlating
to about 3 GB of data. The text data are separated into directories by
source (apw, nyt, xie); within each source, data files are subdivided by
year, and within each year, there is one file per date of collection.
For further information, please visit:
http://www.ldc.upenn.edu/Catalog/LDC2002T31.html
Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $1000.
*
The 2001 NIST Speaker Recognition Evaluation is part of an ongoing
series of yearly evaluations conducted by NIST. These evaluations
provide an important contribution to the direction of research efforts
and the calibration of technical capabilities. They are intended to be
of interest to all researchers working on the general problem of text
independent speaker recognition.
The single CD-ROM 2001 NIST Speaker Recognition Evaluation corpus is
based entirely on conversational cellular telephone speech collected by
the LDC. The files are divided into evaluation and development data.
There are a total of 2,350 compressed speech files, all of which are
in SPHERE format.
For further information, including a link to the 2001 NIST Speaker
Recognition Evaluation website, please visit:
http://www.ldc.upenn.edu/Catalog/LDC2002S34.html
Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $400.
*
If you need additional information before placing your order, or
would like to inquire about membership in the LDC, please send email to
<ldc at ldc.upenn.edu> or call (215) 573-1275.
- ------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 573-1275
3600 Market Street Fax: (215) 573-2175
Suite 810 email: ldc at ldc.upenn.edu
Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu
---------------------------------------------------------------------------
LINGUIST List: Vol-13-2484
More information about the LINGUIST
mailing list