[Corpora-List] old/modern english corpus data??

George Walkden george.walkden at manchester.ac.uk
Fri Nov 9 09:21:09 UTC 2012


Dear Jungsoo,

There's also the Parsed Corpus of Early English Correspondence (PCEEC), freely available via the Oxford Text Archive: http://www-users.york.ac.uk/~lang22/PCEEC-manual/index.htm.

It has 2.2 million words from 1410-1695. A bit earlier than the ones Kat mentions, but it has the advantage of being POS-tagged (though not lemmatized).

Best,

 - George

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
George Walkden
Lecturer in English Linguistics
University of Manchester
george.walkden at manchester.ac.uk<mailto:george.walkden at manchester.ac.uk>
http://personalpages.manchester.ac.uk/staff/george.walkden/
Office: N1.2 Samuel Alexander Building
Tel.: +44 (0)161 275 8905
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

On 9 Nov 2012, at 01:00, "K Gupta" <k.e.gupta at gmail.com<mailto:k.e.gupta at gmail.com>> wrote:

Dear Jungsoo,

You may find the following helpful:

Corpus of Late Modern English Texts - https://perswww.kuleuven.be/~u0044428/
It comprises of two sections: the Corpus of Late Modern English Texts (CLMET) and the Corpus of Late Modern English Texts Extended Version (CLMETEV).  Both comprise of texts arranged in the following time periods: 1710-1780, 1780-1850, and 1850-1920. The texts are varied in terms of genre, ranging from personal letters to literary fiction to scientific writing but inevitably has more formal prose.

Zurich English Newspaper Corpus - http://www.helsinki.fi/varieng/CoRD/corpora/ZEN/index.html
349 complete newspaper issues published between 1661 and 1791, and contains 1.6 million words

The Lampeter Corpus of Early Modern English Tracts - http://ota.ox.ac.uk/headers/2400.xml
Tracts and pamphlets published between 1640 and 1740, organised into the categories of religion, politics, economy and trade, science, law and miscellaneous.  There are 120 different texts, amounting to 1.1 million words


Best wishes,
Kat

On 9 November 2012 00:23, Jungsoo Kim <jungsookim0845 at gmail.com<mailto:jungsookim0845 at gmail.com>> wrote:
Does anyone know where to find freely available online old-/modern- English corpora, whose data are before 1800 (Googlebooks corpora are not ideal for me)? It would be more than wonderful if they have a search function that enable us to search data based on words, lemma, and parts of speech.

I would be really grateful for any sorts of help,
Jungsoo

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no>
http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no>
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121109/aafd5310/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list