[Corpora-List] Parsed corpora of contemporary English
Mark Davies
Mark_Davies at byu.edu
Mon Mar 3 16:54:23 UTC 2008
I'm looking for information on *parsed* corpora of English:
-- parsed (not just POS-tagged)
-- contemporary, not historical, texts
-- 1,000,000 words or more
-- publicly-available
I'm already aware of:
-- Penn Treebank (WSJ 1987-99)
-- Treebank II (WSJ, Brown, etc)
-- Treebank III (WSJ, Brown, Switchboard, etc)
-- IGE-GB
-- Diachronic Corpus of Present-Day Spoken English
-- Linguists Search Engine
-- VISL corpora
(Smaller than 1,000,000 words)
-- Lancaster Parsed Corpus
-- SUSANNE
-- Polytechnic of Wales Corpus
(From http://ucrel.lancs.ac.uk/corpora.html; publicly available??)
-- American Printing House for the Blind Treebank
-- Associated Press Treebank
-- Canadian Hansard Treebank
-- IBM Manuals Treebank
-- Anaphoric Treebank
Any other suggestions? Thanks in advance for your help.
Mark Davies
============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list