Corpora: corpra categories

Christopher Cieri ccieri at ldc.upenn.edu
Tue May 2 18:22:57 UTC 2000


Linda,

LDC's Topic Detection and Tracking (TDT) corpora categorize tens of
thousands of newswire, radio and television stories according to the
news topics they discuss. The TDT-PILOT and TDT-2 corpora have already
been released. The catalog pages are, respectively:
    http://www.ldc.upenn.edu/Catalog/LDC98T25.html
    http://www.ldc.upenn.edu/Catalog/LDC99T37.html
The TDT-3 corpus will be released in 2000.

Note, however, that topic is defined more narrowly in TDT than in the
examples you gave. Rather than offer bandwidth consuming details here, I
give a simple example below and encourage interested readers to visit
the projects' WWW pages at:
    http://www.ldc.upenn.edu/Projects/TDT

    Example TDT topic
    ***************
    83. World AIDS Conference
    Seminal Event:
    WHAT: 12th World AIDS Conference
    WHERE: Geneva, Switzerland
    WHEN: 28 June 1998
    TOPIC EXPLICATION:
    The 12th World AIDS Conference opened in Geneva, and was attended by
international speakers concerned with the
    continuing spread of the AIDS epidemic. Stories on topic may cover
reports on panel discussions, preparations made for the
    conference, concluding proposals, suggestions and possible actions
towards international legislation to address the continuing
    spread of the virus. Reports that are solely on medical advancements
in the fight against aids that bear no linkage to the
    conference are not on topic.
    RELATED RULE OF INTERPRETATION # 11
    Related Article: NYT19980628.0108 More examples: Yes , Brief .

Best wishes,
Chris
--
Christopher Cieri
Executive Director, Linguistic Data Consortium
3615 Market Street, Philadelphia, PA 19104-2608 USA
phone: 215-573-5489, fax: 215-573-2175
mailto:Christopher.Cieri at ldc.upenn.edu
http://www.ldc.upenn.edu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ccieri.vcf
Type: text/x-vcard
Size: 321 bytes
Desc: Card for Christopher Cieri
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20000502/d0b86e83/attachment-0001.vcf>


More information about the Corpora mailing list