[Corpora-List] Parsed corpora of contemporary English

Mark Davies Mark_Davies at byu.edu
Mon Mar 3 16:54:23 UTC 2008


I'm looking for information on *parsed* corpora of English:

-- parsed (not just POS-tagged)
-- contemporary, not historical, texts
-- 1,000,000 words or more
-- publicly-available

I'm already aware of:

-- Penn Treebank (WSJ 1987-99)
   -- Treebank II (WSJ, Brown, etc)
   -- Treebank III (WSJ, Brown, Switchboard, etc)
-- IGE-GB
-- Diachronic Corpus of Present-Day Spoken English
-- Linguists Search Engine
-- VISL corpora
(Smaller than 1,000,000 words)
-- Lancaster Parsed Corpus
-- SUSANNE
-- Polytechnic of Wales Corpus
(From http://ucrel.lancs.ac.uk/corpora.html; publicly available??)
-- American Printing House for the Blind Treebank
-- Associated Press Treebank
-- Canadian Hansard Treebank
-- IBM Manuals Treebank
-- Anaphoric Treebank

Any other suggestions? Thanks in advance for your help.

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list