[Corpora-List] Automatically checking a treebank for errors

Yuval Marton yuvalmarton at gmail.com
Fri Jun 17 21:45:39 UTC 2011


Kevin,

Green and Manning (2010) did some relevant work on the Penn Arabic TB,
extending Dickinson (2005). I guess it could be referred to as
"semi-automatic", in the sense that they automatically collected suspect
elements, and then manually sampled them for human evaluation.


Dickinson, M. Error Detection and Correction in Annotated Corpora. Ph.D.
thesis, The Ohio State University, 2005.

Spence Green and Christopher D. Manning. Better Arabic Parsing: Baselines,
Evaluations, and Analysis. Coling 2010

-Yuval




On Fri, Jun 17, 2011 at 4:36 PM, DJamé Seddah <djame.seddah at free.fr> wrote:

> Dear Kevin,
>
> you may have a look to the work done on the French Treebank by Natalie
> Schuter and Josef van Genabith
> on restructing and correcting a treebank for French (
> http://www.itu.dk/people/nael/Publications.html, NLP section ).
>
>
> Best,
>
> Djamé
>
>
>
>
> Le 17 juin 2011 à 21:47, Kevin B. Cohen a écrit :
>
> > Does anyone know of any tricks for automatically checking a Penn
> > Treebank-style corpus for obvious errors?  I've done some simple stuff
> > in the past for checking POS tags, like looking for punctuation marks
> > with non-punctuation tags, which turned out to be really fruitful, but
> > I can't think of anything clever to do for the syntactic structures.
> >
> > Kev
> >
> > --
> > Kevin Bretonnel Cohen, PhD
> > Biomedical Text Mining Group Lead, Computational Bioscience Program,
> > U. Colorado School of Medicine
> > 303-916-2417 (cell) 303-377-9194 (home)
> > http://compbio.ucdenver.edu/Hunter_lab/Cohen
> >
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110617/c1fe200d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list