[Corpora-List] Rovereto Twitter N-Gram Corpus

Amac Herdagdelen amac at herdagdelen.com
Mon Jan 9 19:33:10 UTC 2012


Dear Corpora List Members,

I'm excited to announce that Rovereto Twitter N-Gram Corpus (RTC), an
n-gram dataset of Twitter messages with gender labels of the authors
and time of posting, is publicly available under a CC license. The
corpus is based on 75 million English tweets collected from the public
stream of Twitter, between December 2010 and July 2011. Instead of
full text content of tweets, frequency statistics of n-grams are
provided. For each n-gram, the frequencies are broken down by gender
of the authors and posting time (i.e., day of the week and hour of the
day). For details, you can visit the corpus homepage:
http://clic.cimec.unitn.it/amac/twitter_ngram/

Thanks,

Amaç Herdağdelen

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list