8.1552, FYI: North American News Text Corpus from the LDC
The LINGUIST List
linguist at linguistlist.org
Wed Oct 29 17:18:31 UTC 1997
LINGUIST List: Vol-8-1552. Wed Oct 29 1997. ISSN: 1068-4875.
Subject: 8.1552, FYI: North American News Text Corpus from the LDC
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>
Review Editor: Andrew Carnie <carnie at linguistlist.org>
Associate Editor: Ljuba Veselinova <ljuba at linguistlist.org>
Assistant Editors: Martin Jacobsen <marty at linguistlist.org>
Brett Churchill <brett at linguistlist.org>
Anita Huang <anita at linguistlist.org>
Julie Wilson <julie at linguistlist.org>
Elaine Halleck <elaine at linguistlist.org>
Software development: John H. Remmers <remmers at emunix.emich.edu>
Zhiping Zheng <zzheng at online.emich.edu>
Home Page: http://linguistlist.org/
Editor for this issue: Brett Churchill <brett at linguistlist.org>
=================================Directory=================================
1)
Date: Wed, 29 Oct 1997 13:36:56 EST
From: LDC Office <ldc at unagi.cis.upenn.edu>
Subject: A New Corpus from the Linguistic Data Consortium
-------------------------------- Message 1 -------------------------------
Date: Wed, 29 Oct 1997 13:36:56 EST
From: LDC Office <ldc at unagi.cis.upenn.edu>
Subject: A New Corpus from the Linguistic Data Consortium
Announcing a NEW CORPUS from the
LINGUISTIC DATA CONSORTIUM
North American News Text Corpus
The Linguistic Data Consortium (LDC) announces the availability
of a corpus of North American news text. This corpus is a
collection of journalistic text in English from newswire and
newspaper sources in the United States.
The North American News Text corpus is composed of news text
that has been marked using SGML. The text is taken from the
following sources:
Source Dates Aprox. #Words
Covered (Millions)
- -------------------------------------------------------
Los Angeles Times & 05/94-08/97 52
Washington Post
New York Times News 07/94-12/96 173
Syndicate
Reuters News Service 04/94-12/96 85
(General & Financial)
Wall Street Journal 07/94-12/96 40
- -------------------------------------------------------
Both the New York Times and the L.A.Times/Washington Post services
actually include a range of other newspaper sources in their
syndicated newswires. The L.A.Times/Wash.Post material will be found
to include the following sources (in lesser amounts) in addition to
the two predominant sources:
Newsday
The Baltimore Sun
The Hartford Courant
The New York Times material will be found to contain the
following sources (in lesser amounts), but N.Y. Times articles
predominate:
Bloomberg Business News
The Boston Globe
Los Angeles Daily News
Fort Worth Star-Telegram
Newsweek
Cox News Service
The Arizona Republic
Seattle Post-Intelligencer
San Francisco Examiner
Houston Chronicle
San Francisco Chronicle
Economist Newspaper Ltd.
Hearst Newspapers
Both of these newswire services also include small numbers of
articles from a larger set of miscellaneous sources. The ones
listed above appear with some frequency on a daily basis.
Because of restrictions imposed by the copyright holders of the
news text, this corpus is available to 1995, 1996 and 1997 LDC
members only. Members who wish to receive this corpus must
sign the North American News Text user agreement. This
agreement is available on the Linguistic Data Consortium WWW
Home Page at URL
http://www.ldc.upenn.edu/ldc/catalog/index.html.
If you would like to order a copy of this corpus, please email
your request to ldc at unagi.cis.upenn.edu. If you need additional
information before placing your order, or would like to inquire
about membership in the LDC, please send email or call (215)
898-0464.
---------------------------------------------------------------------------
LINGUIST List: Vol-8-1552
More information about the LINGUIST
mailing list