26.4825, FYI: The Signal Media One-Million News Articles Dataset

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Fri Oct 30 15:09:12 UTC 2015


LINGUIST List: Vol-26-4825. Fri Oct 30 2015. ISSN: 1069 - 4875.

Subject: 26.4825, FYI: The Signal Media One-Million News Articles Dataset

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
              http://funddrive.linguistlist.org/donate/

Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================


Date: Fri, 30 Oct 2015 11:08:30
From: Dyaa Albakour [dyaa.albakour at signal.uk.com]
Subject: The Signal Media One-Million News Articles Dataset

 
Dataset Release: 

The Signal Media One-Million News Articles Dataset
http://research.signalmedia.co/newsir16/signal-dataset.html

We are delighted to announce the release of the Signal Media One-Million News Articles dataset http://research.signalmedia.co/newsir16/signal-dataset.html. 

This data is intended to serve the community for research on news articles. The dataset is collected by scraping a variety of news sources for a period of 1 month (1-30 September 2015). It contains c.1 million articles that are mainly English. Sources of these articles include major ones, such as Reuters, in addition to local news sources and blogs.

The release of this dataset accompanies the ECIR 2016 workshop on Recent Trends in News Information Retrieval - NewsIR’16 http://research.signalmedia.co/newsir16/, which will take place in Padua, Italy. The dataset can be used for submissions to the NewsIR'16 workshop, but it is intended to serve the community for research on news retrieval in general. Indeed, the NewsIR’16 workshop aims to ultimately identify the current challenges and research directions for news retrieval tasks, which this dataset can facilitate. 

Potential retrieval tasks that can be studied with this data include (but are not limited to):

- detecting and summarising events over time
- identifying bias in news sources to different topics and/or different entities
- identifying influencers in media coverage and visualising information flow

To obtain the dataset, please follow this link:
http://goo.gl/forms/5i4KldoWIX
With regards,
Signal Media Research
research at signalmedia.co
http://signalmedia.co
@signalHQ
 



Linguistic Field(s): Computational Linguistics





 



----------------------------------------------------------
LINGUIST List: Vol-26-4825	
----------------------------------------------------------







More information about the LINGUIST mailing list