Corpora: Multi-document summarisation data

Christopher Cieri ccieri at ldc.upenn.edu
Thu May 25 23:22:30 UTC 2000


Tassos,

The corpora we have which are most like what you need are the Topic Detection
and Tracking corpora. TDT-2 contains tens of thousands of stories from English
(and Mandarin) broadcast news and radio. LDC annotators defined 100 news
topics from stories selected at random from the corpus and then annotated each
of the stories for relevance to each of the 100 topics. The corpora contain
both the stories and the relevance table. For each topic, LDC also developed
topic definitions like the one that follows. These are written primarily to
guide the annotators so I can't say how useful they'll be for your work. For
more information of the TDT-2 corpus, see:
http://www.ldc.upenn.edu/Projects/TDT2/

    90. Unwed Fathers' Law
    Seminal Event:
    WHAT: CA adopts law legalizing the use of paternity forms for unwed
fathers
    WHERE: USA
    WHEN: January 1997
    TOPIC EXPLICATION:
     In 1997, law was passed in California that allows licensed hospitals to
provide a "declaration of paternity" form
    for the unwed parents of a newborn to sign. This document makes unwed
fathers legally responsible for the child.
    Signing the declaration form is voluntary. The declaration entitles
children to the same rights and privileges as
    children born to married parents, and makes unwed fathers easier to track
down should they become "deadbeat dads"
    (decline responsibility and leave the child and mother without financial
support) The document provides the child with
    legal access to parental medical records, and the noncustodial parents'
medical benefits. Several states have adopted
    the use of this document. Stories discussing cases invoking the use of
this law, the developing partnership with clinics,
    county welfare offices, local vital records offices and courts, and
related stories discussing the effects this may have
    nationally are on topic. The implementation of this law by other states as
well as California are on topic if they
    specifically mention the CA law or the "declaration of paternity" form.
    RELATED RULE OF INTERPRETATION # 9
    Related Article: CNN19980626.2130.0558

Chris

Tassos Tombros wrote:

> Hello everybody.
>
> I am looking for a document collection for the purposes of multi-document
> summarisation. What I am looking for is clusters of related documents and
> a corresponding human-written summary for each of the clusters.
>
> Any help would be greatly appreciated.
>
> Thanks,
>
> Tassos
>
> --------------------------------------------------------------------------
> Tassos Tombros, F082                   Tel  : +44 (0)141 330 4971
> Department of Computing Science        Fax  : +44 (0)141 330 4913
> University of Glasgow                  e-mail:tombrosa at dcs.gla.ac.uk
> Glasgow G12 8RZ, UK                    http://www.dcs.gla.ac.uk/~tombrosa/

--
Christopher Cieri
Executive Director, Linguistic Data Consortium
3615 Market Street, Philadelphia, PA 19104-2608 USA
phone: 215-573-5489, fax: 215-573-2175
mailto:Christopher.Cieri at ldc.upenn.edu
http://www.ldc.upenn.edu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ccieri.vcf
Type: text/x-vcard
Size: 321 bytes
Desc: Card for Christopher Cieri
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20000525/ad881012/attachment-0001.vcf>


More information about the Corpora mailing list