[Corpora-List] Requesting Weblog dataset for categorisation or topic detection

Miles Osborne miles at inf.ed.ac.uk
Fri Jun 28 12:13:13 UTC 2013


We released our Twitter corpus on event detection:

http://demeter.inf.ed.ac.uk/cross/docs/fsd_corpus.tar.gz

Note that as per the Twitter terms of service you need to crawl the actual
Tweets.  We provide Tweet IDs and relevance judgements (whether a given
Tweet is relevant to a given event).  The events themselves are listed here:

Sasa Petrovic, Miles Osborne, Richard McCreadie, Craig Macdonald, Iadh
Ounis, Luke Shrimpton. Can Twitter replace Newswire for breaking
news?<http://homepages.inf.ed.ac.uk/miles/papers/short-breaking.pdf>.
ICWSM, Boston US. July 2013.

Miles
-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130628/201340af/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list