[Corpora-List] Twitter datasets available.
Miles Osborne
miles at inf.ed.ac.uk
Mon Jun 28 18:17:36 UTC 2010
We have made available for download various Twitter-related material:
--97 Million Tweets
--meta-information about the users who posted the Tweets
--Tweets annotated as corresponding with a news event, Spam or otherwise.
http://demeter.inf.ed.ac.uk/
The first two sets of data are anonymised; more details about the
construction can be found here:
Sasa Petrovic, Miles Osborne and Victor Lavrenko. The Edinburgh
Twitter Corpus. Computational Linguistics in a World of Social Media
(workshop at NAACL), Los Angeles, USA. June 2010.
http://www.iccs.inf.ed.ac.uk/~osborne/papers/socmed10.pdf
The events dataset is from a later period and was used in our NAACL 10 paper:
Sasa Petrovic, Miles Osborne and Victor Lavrenko. Streaming First
Story Detection with application to Twitter. NAACL, Los Angeles, USA.
June 2010.
http://www.iccs.inf.ed.ac.uk/~osborne/papers/naacl10a.pdf
Miles
Sasa
Victor
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list