[Corpora-List] E-mail corpora?

Dean Jones dean.m.jones at gmail.com
Wed Sep 20 19:22:32 UTC 2006


Hello all,

I'm looking for collections of e-mails which would be suitable for
training some NLP tools, and wondered if anyone on this list could
point me in the right direction. We're mainly interested in training
categorisation tools, but are also interested in performing other
kinds of analysis (e.g. POS tagging, named-entity extraction) to
compare the performance of our tools on e-mails and other kinds of
documents .

I know about the Enron corpus and a couple of spam corpora (Spam
Assassin, TREC SPAM track) - is there anything I'm missing out on? As
this is for a commercial project, I'm interested in hearing about both
free and commercial corpora. Our immediate interest is in
English-language documents, but other languages would also be of
longer-term interest.

Many thanks for any pointers,

Dean.



More information about the Corpora mailing list