[Corpora-List] CNN Transcripts
Mark Davies
Mark_Davies at byu.edu
Wed Nov 16 17:31:10 UTC 2005
Has anyone here done much with the CNN transcripts:
http://transcripts.cnn.com/TRANSCRIPTS/ ?
I'm aware of one publication (below), but would be interested in others
as well:
Hoffmann, Sebastian. "From Web-Page to Mega-Corpus: The CNN
Transcripts." In: Marianne Hundt, Nadja Nesselhauf and Carolin Biewer
(eds.) Corpus Linguistics and the Web. Amsterdam: Rodopi.
I'm also aware of some LDC Corpora that contain CNN transcripts, but in
general these appear to be either from the newspaper or from scripted
news broadcasts, e.g.:
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98T25
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T11
At any rate, even though the genre/register of these transcripts is
fairly homogenous, they do contain more than 170 million words of
unscripted spoken English, so it seems like it might be a nice resource.
Thanks in advance for any information that you might have.
Mark Davies
=================================================
Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
http://davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
=================================================
More information about the Corpora
mailing list