[Corpora-List] Automatically checking a treebank for errors

Adriane Boyd adriane at ling.ohio-state.edu
Sat Jun 18 08:38:48 UTC 2011


Hi Kevin,

Please check out work by Markus Dickinson and Detmar Meurers on error 
detection in corpus annotation:

http://decca.osu.edu/

For POS and Penn treebank-style annotation, the relevant publications are 
from 2003-2005.  The DECCA software includes code for detecting errors in 
POS annotation, Penn treebank-style syntax trees, syntactic annotation 
with discontinuous constituents, and dependency annotation.

-Adriane

On Fri, 17 Jun 2011, Kevin B. Cohen wrote:

> Does anyone know of any tricks for automatically checking a Penn
> Treebank-style corpus for obvious errors?  I've done some simple stuff
> in the past for checking POS tags, like looking for punctuation marks
> with non-punctuation tags, which turned out to be really fruitful, but
> I can't think of anything clever to do for the syntactic structures.
>
> Kev
>
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list