[Corpora-List] Texts 1900-1970
Chris Butler
csblists at telefonica.net
Thu Dec 15 07:55:02 UTC 2005
My thanks to the following people, who all provided information on the
availability of texts: Wendy Anderson, Carmela Chateau, Constantin Orasan,
Raf Salkie, Dirk Siepmann, Pedro Ureña, Romain Vanoudheusden. The sources
which were suggested are as follows:
There are old (and some recent) texts at the project Gutenberg.
www.gutenberg.org/
the public library of science has open access texts.
http://www.plos.org/about/openaccess.html
A selection of online math text books
http://www.math.gatech.edu/~cain/textbooks/onlinebooks.html
the Intratext digital library (contains many religious texts, as well as a
lot of literature)
http://www.intratext.com/
The SCOTS Corpus (which is freely accessible and searchable at
www.scottishcorpus.ac.uk) contains texts in Scottish English (as well as
dialects of Scots), from 1940 to the present day.
The New York Times Archive
(http://pqasb.pqarchiver.com/nytimes/advancedsearch.html) goes back to 19th
century
The collection of texts hosted by archive.org
(http://www.archive.org/details/texts) includes texts from the Gutenberg
Project
The Victorian Literary Studies archive at
http://victorian.lang.nagoya-u.ac.jp/index.html, which has a list of authors
at http://victorian.lang.nagoya-u.ac.jp/concordance.html
The archive at www.questia.com
******
I'd also like to mention the Corpus of Late Modern English Texts compiled by
Hendrik de Smet at the Catholic University of Leuven
(http://perswww.kuleuven.be/~u0044428/), a principled collection of texts
(10 million words, 1720-1920) drawn from archives such as Project Gutenberg
and the Oxford Text Archive. A username and password must be obtained from
Hendrik (Hendrik.desmets at arts.kuleuven.be) in order to access the corpus.
Chris Butler
Honorary Professor, University of Wales Swansea, UK
More information about the Corpora
mailing list