Corpora: Looking for syntactically parsed corpora in English, French, and German

Rene.Valdes at lhsl.com Rene.Valdes at lhsl.com
Wed Aug 1 21:32:51 UTC 2001


Both parsers were developed using data available from the Penn Treebank, a
syntactically tagged corpus which includes the Wall Street Journal (WSJ)
Penn Treebank Corpus and the Penn Treebank Brown Corpus.

The Penn Treebank Project annotates naturally-occuring text for linguistic
structure. Most notably, we produce skeletal parses showing rough syntactic
and semantic information -- a bank of linguistic trees. We also annotate
text with part-of-speech tags, and for the Switchboard corpus of telephone
conversations, dysfluency annotation. We are located in the LINC Laboratory
of the Computer and Information Science Department at the University of
Pennsylvania.
All data produced by the Treebank is released through the Linguistic Data
Consortium



More information about the Corpora mailing list