28.825, FYI: Over 95 Million Wikipedia Discussion Comments
The LINGUIST List
linguist at listserv.linguistlist.org
Mon Feb 13 20:50:16 UTC 2017
LINGUIST List: Vol-28-825. Mon Feb 13 2017. ISSN: 1069 - 4875.
Subject: 28.825, FYI: Over 95 Million Wikipedia Discussion Comments
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
Michael Czerniakowski)
Homepage: http://linguistlist.org
Editor for this issue: Yue Chen <yue at linguistlist.org>
================================================================
Date: Mon, 13 Feb 2017 15:49:52
From: Melody Kramer [mkramer at wikimedia.org]
Subject: Over 95 Million Wikipedia Discussion Comments
Yesterday, Wikipedia released a corpus of all 95 million user and article talk
comments made on Wikipedia between 2001-2015 - It is the largest annotated
dataset of online personal attacks and a corpus of over 95 million Wikipedia
discussion comments.
More information at:
https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/
Both data sets are available on FigShare, a research repository where users
can share data, to support further research:
https://figshare.com/projects/Wikipedia_Talk/16731
If you’re interested in collaborating with the Wikimedia Foundation on
research in this area, you can find documentation on formal collaborations
here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations
You can also follow @WikiResearch on Twitter for news and updates on research,
datasets and APIs from Wikimedia projects or contact the Wikimedia Research
team at research-wmf at wikimedia.org
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
----------------------------------------------------------
LINGUIST List: Vol-28-825
----------------------------------------------------------
More information about the LINGUIST
mailing list