Keely,<div><br></div><div>our tool WebBootCaT will do the scraping, and deliver a corpus, for you; you can point it to the music-reviews sites Joel mentions (under 'advanced options') and then it will build a corpus from pages there. You'll first need to self-register, at <a href="http://www.sketchengine.co.uk">http://www.sketchengine.co.uk</a></div>
<div><br></div><div>Regards</div><div><br></div><div>Adam<br><br><div class="gmail_quote">On 11 November 2011 15:01, Tetreault, Joel <span dir="ltr"><<a href="mailto:JTetreault@ets.org">JTetreault@ets.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Keely, instead of collecting music reviews from news corpora, it might be more effective to go to music review sites and scrape them off the webpage. Some good ones are:<br>
<br>
<a href="http://allmusic.com" target="_blank">allmusic.com</a> - one of the largest repositories of music reviews in the world. If you want to make a massive corpus, I would just scrape that.<br>
<br>
<a href="http://pitchfork.com" target="_blank">pitchfork.com</a> - has (indie) music reviews going back to 1999, and they have 5 or so album reviews a day. The album reviews section is here: <a href="http://pitchfork.com/reviews/albums/" target="_blank">http://pitchfork.com/reviews/albums/</a> They changed their site format a few months ago, but before that I scraped all the reviews to make a music review corpus. I could zip that up and send it to you, though it may require some post-processing here and there. There are over 10,000 reviews in that scrape.<br>
<br>
<a href="http://metacritic.com" target="_blank">metacritic.com</a> - is a review aggregator site for movies, music, games, etc. It has links to reviews on other websites and then normalizes the scores from each website to give a composite score.<br>
<br>
<a href="http://nme.com" target="_blank">nme.com</a> / <a href="http://spin.com" target="_blank">spin.com</a> / rollingstone - all have music reviews on their website, another good source for webscraping.<br>
<br>
Joel<br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Thu, 10 Nov 2011 14:51:38 -0500<br>
From: Keely <<a href="mailto:km.mimnagh@gmail.com">km.mimnagh@gmail.com</a>><br>
Subject: [Corpora-List] North america newspaper corpus<br>
To: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
<br>
Hi I am a master's student. I am running a study on the language of music<br>
critics. Does anyone know of a corpus that breaks down newspapers by<br>
sections. So I can parse on music reviews?<br>
<br>
Any help would be much appreciated.<br>
<br>
<br>
Keely<br>
<font color="#888888"><br>
--<br>
Keely Mimnagh<br>
M.A. Candidate<br>
Music and Culture<br>
Carleton University<br>
Ottawa, Ontario<br>
_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</font></blockquote></div><br><br clear="all"><div><br></div>-- <br>========================================<br><a href="http://www.kilgarriff.co.uk/" target="_blank">Adam Kilgarriff</a> <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a> <br>
Director <a href="http://www.sketchengine.co.uk/" target="_blank">Lexical Computing Ltd</a> <br>Visiting Research Fellow <a href="http://leeds.ac.uk" target="_blank">University of Leeds</a> <div>
<i><font color="#006600">Corpora for all</font></i> with <a href="http://www.sketchengine.co.uk" target="_blank">the Sketch Engine</a> </div><div> <i><a href="http://www.webdante.com" target="_blank">DANTE: <font color="#009900">a lexical database for English</font></a><font color="#009900"> </font> </i><div>
========================================</div></div><br>
</div>