[Corpora-List] on processing Junk e-mail

Paula Newman paulan at earthlink.net
Sun Jul 25 19:29:57 UTC 2010


Anabela,
While not disputing the potential for increasing the amount of linguistic information used to detect spam and other email characteristics,  I should mention that the spam message in question arrived with the subject annotated by Norton antivirus to the effect that no virus was found.   In other words, while the message was not identified as spam by the operative spamblockers, it looked sufficiently suspicious for my antivirus processor to take a look at it.

Paula
----- Original Message ----- 
From: Anabela Barreiro 
To: grvsmth at panix.com;corpora at uib.no
Sent: 7/25/2010 3:30:18 PM 
Subject: Re: [Corpora-List] on processing Junk e-mail


Dear Angus,
 
I agree with what you said about personal e-mails and the false positives effect based on the Subject of the e-mail, because personal e-mail have a much wider variety of topics and friends can have quite an imagination :) - However, what kind of business e-mail or e-mails for a list that discusses important issues (like the Corpora list), would start with "hey"? Probability 0, I would say!

Then there are simple combinations of the e-mail provider with Subject that could work well too.
 
But, I find it interesting/challenging to create a sophisticated program to sort e-mails by subject matter, that looks into the body of the message and analyses combinations of words and linguistic constructions (not just n-grams) and classifies them (including spam). While this might not be an investment worth for most common users, it would definitely be for big/international companies and such linguistically enriched software would be worth much more than just being applied to sorting and classifying e-mails. I believe that kind of software would help selecting quite some garbage and help prioritise important e-mails and intelligently sort them by topic. Perhaps some of software of this kind already exists. The work done for this software can be used by many other applications.
 
Regards,


Anabela 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100725/f2fd35f4/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list