[Corpora-List] congressional-speech dataset available

Thu Dec 14 06:02:59 UTC 2006

The "congressional speech" corpus and associated graph information
used in our "Get out the vote: Determining support or opposition from
Congressional floor-debate transcripts" EMNLP 2006 paper is now
available.

Specifically, the data includes speeches as individual documents,
together with:

    * automatically-derived labels for whether the speakers supported
      the legislation under discussion or not, allowing for
      experiments with this kind of sentiment analysis

    * indications of which debate each speech comes from (and the
      position within the debate), allowing for consideration of
      conversational structure

    * indications of by-name references between speakers, allowing for
      experiments with agreement classification (if one determines the
      "true" labels from the support/oppose labels assigned to the
      pair of speakers in question)

    * the edge weights and other information we derived to create the
      graphs we used for our experiments upon this data, facilitating
      implementation of alternative graph-based classification methods
      upon the graphs we constructed

The download site is:
http://www.cs.cornell.edu/home/llee/data/convote.html

Matt Thomas, Bo Pang, and Lillian Lee