[Corpora-List] on processing Junk e-mail
Anabela Barreiro
barreiro_anabela at hotmail.com
Sun Jul 25 19:55:19 UTC 2010
Thank you, Paula!
That's good to know! :)
I use free software - Avast and got no such suspicious warning. Either way, in this case there was no real danger, except for the annoyance :)
I have NOT read the e-mail or click on the link. I intentionally responded to Siddhartha's request to block the sender with copy to the all the members of this list, because I thought the discussion would be interesting! There is still a lot to be done to improve software, that's my conclusion and linguists would like to help and can help! Perhaps there are as many unemployed linguists as there are software companies developing products who need them, even without realizing it... I would even dare to say that we could employ all the unemployed linguists and that would not be enough... :)
Regards,
Anabela.
From: paulan at earthlink.net
To: barreiro_anabela at hotmail.com; grvsmth at panix.com; corpora at uib.no
Subject: Re: [Corpora-List] on processing Junk e-mail
Date: Sun, 25 Jul 2010 12:29:57 -0700
Anabela,
While not disputing the potential for increasing the amount of linguistic information used to detect spam and other email characteristics, I should mention that the spam message in question arrived with the subject annotated by Norton antivirus to the effect that no virus was found. In other words, while the message was not identified as spam by the operative spamblockers, it looked sufficiently suspicious for my antivirus processor to take a look at it.
Paula
----- Original Message -----
From: Anabela Barreiro
To: grvsmth at panix.com;corpora at uib.no
Sent: 7/25/2010 3:30:18 PM
Subject: Re: [Corpora-List] on processing Junk e-mail
Dear Angus,
I agree with what you said about personal e-mails and the false positives effect based on the Subject of the e-mail, because personal e-mail have a much wider variety of topics and friends can have quite an imagination :) - However, what kind of business e-mail or e-mails for a list that discusses important issues (like the Corpora list), would start with "hey"? Probability 0, I would say!
Then there are simple combinations of the e-mail provider with Subject that could work well too.
But, I find it interesting/challenging to create a sophisticated program to sort e-mails by subject matter, that looks into the body of the message and analyses combinations of words and linguistic constructions (not just n-grams) and classifies them (including spam). While this might not be an investment worth for most common users, it would definitely be for big/international companies and such linguistically enriched software would be worth much more than just being applied to sorting and classifying e-mails. I believe that kind of software would help selecting quite some garbage and help prioritise important e-mails and intelligently sort them by topic. Perhaps some of software of this kind already exists. The work done for this software can be used by many other applications.
Regards,
Anabela
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100725/0560dddd/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list