28.825, FYI: Over 95 Million Wikipedia Discussion Comments

The LINGUIST List linguist at listserv.linguistlist.org
Mon Feb 13 20:50:16 UTC 2017


LINGUIST List: Vol-28-825. Mon Feb 13 2017. ISSN: 1069 - 4875.

Subject: 28.825, FYI: Over 95 Million Wikipedia Discussion Comments

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Editor for this issue: Yue Chen <yue at linguistlist.org>
================================================================


Date: Mon, 13 Feb 2017 15:49:52
From: Melody Kramer [mkramer at wikimedia.org]
Subject: Over 95 Million Wikipedia Discussion Comments

 Yesterday, Wikipedia released a corpus of all 95 million user and article talk
comments made on Wikipedia between 2001-2015 - It is the largest annotated
dataset of online personal attacks and a corpus of over 95 million Wikipedia
discussion comments.

More information at: 

https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/

Both data sets are available on FigShare, a research repository where users
can share data, to support further research:

https://figshare.com/projects/Wikipedia_Talk/16731

If you’re interested in collaborating with the Wikimedia Foundation on
research in this area, you can find documentation on formal collaborations
here:

https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations

You can also follow @WikiResearch on Twitter for news and updates on research,
datasets and APIs from Wikimedia projects or contact the Wikimedia Research
team at research-wmf at wikimedia.org
 
Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics



----------------------------------------------------------
LINGUIST List: Vol-28-825	
----------------------------------------------------------







More information about the LINGUIST mailing list