[Corpora-List] Corpora Digest, Vol 53, Issue 14
Adam Kilgarriff
adam at lexmasterclass.com
Sun Nov 13 19:15:15 UTC 2011
Keely,
our tool WebBootCaT will do the scraping, and deliver a corpus, for you;
you can point it to the music-reviews sites Joel mentions (under 'advanced
options') and then it will build a corpus from pages there. You'll first
need to self-register, at http://www.sketchengine.co.uk
Regards
Adam
On 11 November 2011 15:01, Tetreault, Joel <JTetreault at ets.org> wrote:
> Hi Keely, instead of collecting music reviews from news corpora, it might
> be more effective to go to music review sites and scrape them off the
> webpage. Some good ones are:
>
> allmusic.com - one of the largest repositories of music reviews in the
> world. If you want to make a massive corpus, I would just scrape that.
>
> pitchfork.com - has (indie) music reviews going back to 1999, and they
> have 5 or so album reviews a day. The album reviews section is here:
> http://pitchfork.com/reviews/albums/ They changed their site format a
> few months ago, but before that I scraped all the reviews to make a music
> review corpus. I could zip that up and send it to you, though it may
> require some post-processing here and there. There are over 10,000 reviews
> in that scrape.
>
> metacritic.com - is a review aggregator site for movies, music, games,
> etc. It has links to reviews on other websites and then normalizes the
> scores from each website to give a composite score.
>
> nme.com / spin.com / rollingstone - all have music reviews on their
> website, another good source for webscraping.
>
> Joel
>
> ------------------------------
>
> Message: 5
> Date: Thu, 10 Nov 2011 14:51:38 -0500
> From: Keely <km.mimnagh at gmail.com>
> Subject: [Corpora-List] North america newspaper corpus
> To: corpora at uib.no
>
> Hi I am a master's student. I am running a study on the language of music
> critics. Does anyone know of a corpus that breaks down newspapers by
> sections. So I can parse on music reviews?
>
> Any help would be much appreciated.
>
>
> Keely
>
> --
> Keely Mimnagh
> M.A. Candidate
> Music and Culture
> Carleton University
> Ottawa, Ontario
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director Lexical Computing
Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow University of
Leeds<http://leeds.ac.uk>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
*DANTE: a lexical database for
English<http://www.webdante.com>
*
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111113/f4ace585/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list