[Corpora-List] Corpora Digest, Vol 53, Issue 14

Adam Kilgarriff adam at lexmasterclass.com
Sun Nov 13 19:15:15 UTC 2011


Keely,

our tool WebBootCaT will do the scraping, and deliver a corpus, for you;
you can point it to the music-reviews sites Joel mentions (under 'advanced
options')  and then it will build a corpus from pages there.  You'll first
need to self-register, at http://www.sketchengine.co.uk

Regards

Adam

On 11 November 2011 15:01, Tetreault, Joel <JTetreault at ets.org> wrote:

> Hi Keely, instead of collecting music reviews from news corpora, it might
> be more effective to go to music review sites and scrape them off the
> webpage.  Some good ones are:
>
> allmusic.com - one of the largest repositories of music reviews in the
> world.  If you want to make a massive corpus, I would just scrape that.
>
> pitchfork.com - has (indie) music reviews going back to 1999, and they
> have 5 or so album reviews a day.  The album reviews section is here:
> http://pitchfork.com/reviews/albums/  They changed their site format a
> few months ago, but before that I scraped all the reviews to make a music
> review corpus.  I could zip that up and send it to you, though it may
> require some post-processing here and there.  There are over 10,000 reviews
> in that scrape.
>
> metacritic.com - is a review aggregator site for movies, music, games,
> etc.  It has links to reviews on other websites and then normalizes the
> scores from each website to give a composite score.
>
> nme.com / spin.com / rollingstone - all have music reviews on their
> website, another good source for webscraping.
>
> Joel
>
> ------------------------------
>
> Message: 5
> Date: Thu, 10 Nov 2011 14:51:38 -0500
> From: Keely <km.mimnagh at gmail.com>
> Subject: [Corpora-List] North america newspaper corpus
> To: corpora at uib.no
>
> Hi I am a master's student. I am running a study on the language of music
> critics. Does anyone know of a corpus that breaks down newspapers by
> sections. So I can parse on music reviews?
>
> Any help would be much appreciated.
>
>
> Keely
>
> --
> Keely Mimnagh
> M.A. Candidate
> Music and Culture
> Carleton University
> Ottawa, Ontario
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111113/f4ace585/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list