[Corpora-List] Clean Enron Anyone?

peetm peet.morris at comlab.ox.ac.uk
Fri Mar 18 17:14:13 UTC 2005


Greets!

I'm wondering whether anyone has a 'cleaned' version of the Enron email
corpus?

In its raw state, most of the emails contain routing-headers, footers, and
disclaimers etc - plus, IMHO, some of the emails are spam.

If no one has a cleaned up version, I am going to attempt the clean up
myself - so, if anyone's interested in getting the output of that effort,
please let me know.

Have a nice weekend,

peetm



More information about the Corpora mailing list