[Corpora-List] Automatically checking a treebank for errors

Paula Newman paulan at earthlink.net
Sat Jun 18 19:07:27 UTC 2011


Kevin,

Re:
> Does anyone know of any tricks for automatically checking a Penn
> Treebank-style corpus for obvious errors? 

One possibility is to strip the tags but maintain the dependency-based indentation, and visually check the results for obvious dependency errors,

This suggestion comes from my work in generating quickly-scanned parser results for English documents.  The format is similar to that of the Penn Treebank, but even more flattened, and without tags.   For example:

Stokely
says
      {stores
        revive
             specials
                   like three cans
                               of peas
                                for 99 cents}


See  http://www.aclweb.org/anthology-new/W/W05/W05-1101.pdf
for more info.

Paula

> [Original Message]
> From: Kevin B. Cohen <kevin.cohen at gmail.com>
> To: Corpora List <corpora at uib.no>
> Date: 6/17/2011 4:47:59 PM
> Subject: [Corpora-List] Automatically checking a treebank for errors
>
> Does anyone know of any tricks for automatically checking a Penn
> Treebank-style corpus for obvious errors?  I've done some simple stuff
> in the past for checking POS tags, like looking for punctuation marks
> with non-punctuation tags, which turned out to be really fruitful, but
> I can't think of anything clever to do for the syntactic structures.
>
> Kev
>
> -- 
> Kevin Bretonnel Cohen, PhD
> Biomedical Text Mining Group Lead, Computational Bioscience Program,
> U. Colorado School of Medicine
> 303-916-2417 (cell) 303-377-9194 (home)
> http://compbio.ucdenver.edu/Hunter_lab/Cohen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110618/551baf22/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list