[Corpora-List] Searching for an email corpus - SUMMARY

Ute Römer ute.roemer at engsem.uni-hannover.de
Wed Apr 11 18:39:56 UTC 2007


Dear All, 
 
Here is a quick summary of the messages I got in response to my recent query
on email corpora. I'd like to thank the following list members for helpful
pointers: 
Stefan Bordag
Chris Jordan
Sabine Bartsch
Ramesh Krishnamurthy
 
Stefan Bordag mentioned the (huge) USENET corpus which does not contain
emails but texts of a similar type (from an internet discussion forum):
<http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.h
tml>
http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.ht
ml
 
Chris Jordan suggested the SpamAssassin Corpus
(http://spamassassin.apache.org/).
 
Sabine Bartsch and Ramesh Krishnamurthy sent me a link to the Wolverhampton
junk email corpus(http://clg.wlv.ac.uk/projects/junk-email/); Sabine also
mentioned the email messages corpus from W3C lists
(http://tides.umiacs.umd.edu/webtrec/trecent/parsed_w3c_corpus.html). 
 
I have now got plenty of corpus material to keep my 'Analysing Texts'
students busy... Thanks! 
 
Very best wishes... Ute
 
 
************************************************************
 
Dr. Ute Römer
English Department
Leibniz University of Hanover
Königsworther Platz 1
30167 Hannover
Germany
 
Phone: +49 (0)511 762 2997
Fax: +49 (0)511 762 2996
Please note NEW e-mail address: ute.roemer at engsem.uni-hannover.de
http://www.uteroemer.com <http://www.uteroemer.com/> 
http://www.engsem.uni-hannover.de/angli/

 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070411/7a7e5217/attachment.htm>


More information about the Corpora mailing list