[Corpora-List] Clean Enron Anyone?
peetm
peet.morris at comlab.ox.ac.uk
Fri Mar 18 17:14:13 UTC 2005
Greets!
I'm wondering whether anyone has a 'cleaned' version of the Enron email
corpus?
In its raw state, most of the emails contain routing-headers, footers, and
disclaimers etc - plus, IMHO, some of the emails are spam.
If no one has a cleaned up version, I am going to attempt the clean up
myself - so, if anyone's interested in getting the output of that effort,
please let me know.
Have a nice weekend,
peetm
More information about the Corpora
mailing list