Kevin,<br><br>Green and Manning (2010) did some relevant work on the Penn Arabic TB, extending Dickinson (2005). I guess it could be referred to as "semi-automatic", in the sense that they automatically collected suspect elements, and then manually sampled them for human evaluation. <br>
<br><br>Dickinson, M. Error Detection and Correction in Annotated Corpora. Ph.D. thesis, The Ohio State University, 2005.<br><br>Spence Green and Christopher D. Manning. Better Arabic Parsing: Baselines, Evaluations, and Analysis. Coling 2010<br>
<br>-Yuval<br><br><br><br><br><div class="gmail_quote">On Fri, Jun 17, 2011 at 4:36 PM, DJamé Seddah <span dir="ltr"><<a href="mailto:djame.seddah@free.fr">djame.seddah@free.fr</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Dear Kevin,<br>
<br>
you may have a look to the work done on the French Treebank by Natalie Schuter and Josef van Genabith<br>
on restructing and correcting a treebank for French (<a href="http://www.itu.dk/people/nael/Publications.html" target="_blank">http://www.itu.dk/people/nael/Publications.html</a>, NLP section ).<br>
<br>
<br>
Best,<br>
<br>
Djamé<br>
<br>
<br>
<br>
<br>
Le 17 juin 2011 à 21:47, Kevin B. Cohen a écrit :<br>
<div><div></div><div class="h5"><br>
> Does anyone know of any tricks for automatically checking a Penn<br>
> Treebank-style corpus for obvious errors? I've done some simple stuff<br>
> in the past for checking POS tags, like looking for punctuation marks<br>
> with non-punctuation tags, which turned out to be really fruitful, but<br>
> I can't think of anything clever to do for the syntactic structures.<br>
><br>
> Kev<br>
><br>
> --<br>
> Kevin Bretonnel Cohen, PhD<br>
> Biomedical Text Mining Group Lead, Computational Bioscience Program,<br>
> U. Colorado School of Medicine<br>
> <a href="tel:303-916-2417" value="+13039162417">303-916-2417</a> (cell) <a href="tel:303-377-9194" value="+13033779194">303-377-9194</a> (home)<br>
> <a href="http://compbio.ucdenver.edu/Hunter_lab/Cohen" target="_blank">http://compbio.ucdenver.edu/Hunter_lab/Cohen</a><br>
><br>
> _______________________________________________<br>
> UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
> Corpora mailing list<br>
> <a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
> <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
<br>
_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</div></div></blockquote></div><br>