20.1529, Media: New York Times Article on NYT Annotated Corpus
LINGUIST Network
linguist at LINGUISTLIST.ORG
Wed Apr 22 14:58:27 UTC 2009
LINGUIST List: Vol-20-1529. Wed Apr 22 2009. ISSN: 1068 - 4875.
Subject: 20.1529, Media: New York Times Article on NYT Annotated Corpus
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Randall Eggert, U of Utah
<reviews at linguistlist.org>
Homepage: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Matthew Lahrman <matt at linguistlist.org>
================================================================
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
===========================Directory==============================
1)
Date: 22-Apr-2009
From: Evan Sandhaus < sandes at nytimes.com >
Subject: New York Times Article on NYT Annotated Corpus
-------------------------Message 1 ----------------------------------
Date: Wed, 22 Apr 2009 10:56:30
From: Evan Sandhaus [sandes at nytimes.com]
Subject: New York Times Article on NYT Annotated Corpus
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=20-1529.html&submissionid=214691&topicid=21&msgnumber=1
Available for noncommercial research license from The Linguistic Data
Consortium (LDC), the corpus spans 20 years of newspapers between 1987 and
2007 (that's 7,475 issues, to be exact). This collection includes the text
of 1.8 million articles written at The Times. Of these, more than 1.5
million have been manually annotated by The New York Times Index with
distinct tags for people, places, topics and organizations drawn from a
controlled vocabulary. A further 650,000 articles also include summaries
written by indexers from the New York Times Index. The corpus is provided
as a collection of XML documents in the News Industry Text Format and
includes open source Java tools for parsing documents into memory resident
objects.
You can read more about the corpus at:
http://open.blogs.nytimes.com/2009/01/12/fatten-up-your-corpus/
All the best,
Evan Sandhaus
--
Semantic Technologist
Research & Development Operations
New York Times Company
Linguistic Field(s): Computational Linguistics
Lexicography
Semantics
Text/Corpus Linguistics
-----------------------------------------------------------
This Year the LINGUIST List hopes to raise $60,000. This money will go to help
keep the List running by supporting all of our Student Editors for the coming year.
See below for donation instructions, and don't forget to check out our Fund Drive
2009 LINGUIST List Restaurant and join us for a delightful treat!
http://linguistlist.org/fund-drive/2009/
There are many ways to donate to LINGUIST!
You can donate right now using our secure credit card form at
https://linguistlist.org/donation/donate/donate1.cfm
Alternatively you can also pledge right now and pay later. To do so, go to:
https://linguistlist.org/donation/pledge/pledge1.cfm
For all information on donating and pledging, including information on how to
donate by check, money order, or wire transfer, please visit:
http://linguistlist.org/donate.html
The LINGUIST List is under the umbrella of Eastern Michigan University and as such
can receive donations through the EMU Foundation, which is a registered 501(c) Non
Profit organization. Our Federal Tax number is 38-6005986. These donations can be
offset against your federal and sometimes your state tax return (U.S. tax payers
only). For more information visit the IRS Web-Site, or contact your financial advisor.
Many companies also offer a gift matching program, such that they will match any
gift you make to a non-profit organization. Normally this entails your contacting
your human resources department and sending us a form that the EMU Foundation fills
in and returns to your employer. This is generally a simple administrative procedure
that doubles the value of your gift to LINGUIST, without costing you an extra penny.
Please take a moment to check if your company operates such a program.
Thank you very much for your support of LINGUIST!
-----------------------------------------------------------
LINGUIST List: Vol-20-1529
More information about the LINGUIST
mailing list