[Corpora-List] Automatically checking a treebank for errors

Ann Bies annbies at yahoo.com
Sun Jun 19 14:15:42 UTC 2011


Hi, Kevin,

There is also some work at LDC on using a TAG-based decomposition of the treebank to compare syntactic structures that may be relevant:


http://papers.ldc.upenn.edu/ACL2011/DerivationTrees_TBErrorDetection.pdf



Seth Kulick, Ann Bies, and Justin Mott

Using Derivation Trees for Treebank Error Detection

ACL 2011, Portland, Oregon, USA, June 19-24, 2011

Available: Paper in PDF

Thanks,

Ann


--- On Fri, 6/17/11, Kevin B. Cohen <kevin.cohen at gmail.com> wrote:

From: Kevin B. Cohen <kevin.cohen at gmail.com>
Subject: [Corpora-List] Automatically checking a treebank for errors
To: "Corpora List" <corpora at uib.no>
Date: Friday, June 17, 2011, 3:47 PM

Does anyone know of any tricks for automatically checking a Penn
Treebank-style corpus for obvious errors?  I've done some simple stuff
in the past for checking POS tags, like looking for punctuation marks
with non-punctuation tags, which turned out to be really fruitful, but
I can't think of anything clever to do for the syntactic structures.

Kev

-- 
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.ucdenver.edu/Hunter_lab/Cohen

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110619/2ee0d942/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list