[Corpora-List] Automatically checking a treebank for errors
Paula Newman
paulan at earthlink.net
Sat Jun 18 19:07:27 UTC 2011
Kevin,
Re:
> Does anyone know of any tricks for automatically checking a Penn
> Treebank-style corpus for obvious errors?
One possibility is to strip the tags but maintain the dependency-based indentation, and visually check the results for obvious dependency errors,
This suggestion comes from my work in generating quickly-scanned parser results for English documents. The format is similar to that of the Penn Treebank, but even more flattened, and without tags. For example:
Stokely
says
{stores
revive
specials
like three cans
of peas
for 99 cents}
See http://www.aclweb.org/anthology-new/W/W05/W05-1101.pdf
for more info.
Paula
> [Original Message]
> From: Kevin B. Cohen <kevin.cohen at gmail.com>
> To: Corpora List <corpora at uib.no>
> Date: 6/17/2011 4:47:59 PM
> Subject: [Corpora-List] Automatically checking a treebank for errors
>
> Does anyone know of any tricks for automatically checking a Penn
> Treebank-style corpus for obvious errors? I've done some simple stuff
> in the past for checking POS tags, like looking for punctuation marks
> with non-punctuation tags, which turned out to be really fruitful, but
> I can't think of anything clever to do for the syntactic structures.
>
> Kev
>
> --
> Kevin Bretonnel Cohen, PhD
> Biomedical Text Mining Group Lead, Computational Bioscience Program,
> U. Colorado School of Medicine
> 303-916-2417 (cell) 303-377-9194 (home)
> http://compbio.ucdenver.edu/Hunter_lab/Cohen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110618/551baf22/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list