[Corpora-List] DiaView: Visualise Cultural Change in Diachronic Corpora using Google Books

David Beavan David.Beavan at glasgow.ac.uk
Mon Nov 21 16:22:53 UTC 2011


Dear all

Those interested in visualisations, salience/key words, or in using the Google Books Ngrams may like to take a look at:

http://www.scottishcorpus.ac.uk/corpus/diaview/

The Google Books Ngram corpus has proved very popular, springing many new visualisations. This is my take on identifying culturally important issues automatically over time. You don't need to start your search by looking for anything, it's completely opportunistic. The technique goes beyond word frequency, striving to highlight salient terms, rather than simply high frequency terms. It's possible to browse 1850 to the present in 5 year blocks, or as individual years.

There are many issues with the tool and the corpus: the OCR, sampling, lack of genre, iffy metadata and more. That said, it's an opportunity to dig into 100 billion words across 150 years!

Hope you like it.

Dave

-- 
David Beavan
English Language Computing Manager
University of Glasgow
+44 (0)141 330 2382
http://www.scottishcorpus.ac.uk/
The University of Glasgow, charity number SC004401

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list