[Corpora-List] Input requested (brief survey): CARBON Treebank

Emily M. Bender ebender at uw.edu
Mon Oct 3 17:40:24 UTC 2011


[ Please forward to other interested parties ...
 apologies for cross-posting ]

Dear Colleagues,

We are preparing a grant application to the NSF Computing
Research Infrastructure program to fund the preparation of a
treebank of 14.6 million words of the Open American National
Corpus (Ide 2008).  This treebank will be prepared on the basis
of the English Resource Grammar (Flickinger 2000, 2011) using
the Redwoods (Oepen et al 2004) methodology in which the
grammar creates as parse forest and the annotators select
the intended tree.  In particular, we will produce 1 million words
of hand-verified trees and an additional 13.6 million words where
the trees were automatically selected, with an expected exact
match parse selection accuracy of over 80% by the end of the
project.

The treebank will include scripts to export selected vistas
on the information including:

--- A variety of POS tagsets
--- Constituent structures, again with a variety of node label sets
--- Dependency structures, again in a variety of popular formats
--- MRS semantic representations (Copestake et al 2005)

As part of our grant application, we are conducting a survey to
better understand how this resource could be useful to the field.

Please take a few moments to answer the questions at
this link:

https://www.surveymonkey.com/s/YB7LVKT

Many thanks,
Emily Bender, University of Washington
Dan Flickinger, Stanford University

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list