[Corpora-List] 2. Italian media corpora (Stefan Schneider)
Cristina Bosco
bosco at di.unito.it
Sat Nov 23 11:44:22 UTC 2013
Dear Stefan,
I would like to signal a corpus for the survey of Italian media corpora.
The corpus is called Senti-TUT and is annotated for sentiment analysis with the major aim at developing a new resource for Italian and study a particular linguistic device: irony. Since irony is often exploited in political domains, texts included in Senti-TUT are extracted from social media, i.e. Twitter, by filtering them with keywords related to the political domain. We collected messages posted during election season in Italy, after Mario Monti was nominated to replace Silvio Berlusconi as the prime minister (from 6 October 2011 to 3 February 2012). A further collection of data has been extracted from the Twitter section of an Italian satirical blog, i.e. Spinoza. The corpus currently consists of around 4,400 tweets.
It will be made available for download in a few months, in the form allowed by Twitter policies and covered by a Creative Commons license.
For more information:
- the Senti-TUT website: http://www.di.unito.it/~tutreeb/sentiTUT.html
- A. Gianti, C. Bosco, V. Patti, A. Bolioli et al., “Annotating Irony in a Novel Italian Corpus for Sentiment Analysis,” Proc. 4th Workshop on Corpora for Research on Emotion Sentiment and Social Signals, ELRA, 2012, pp. 1–7
- C. Bosco, V. Patti, A. Bolioli: Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT. IEEE Intelligent Systems 28(2): 55-63 (2013), pp. 55-63
Thank you for the collection of this very useful survey,
Cristina Bosco
(Dipartimento di Informatica, Università di Torino)
bosco at di.unito.it
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list