[Corpora-List] Metrics for corpus "parseability"

Sandra Kuebler skuebler at indiana.edu
Mon Feb 4 22:27:56 UTC 2008


There is related work about the ambiguity of grammars induced from  
treebanks. Anna Corazza, Alberto Lavelli, and Giorgio Satta used  
conditional cross entropy for that. This may help to at least  
abstract away from the parser :)

Sandra


On Feb 4, 2008, at 5:21 PM, Miles Osborne wrote:

> Chris Brew suggested I actually explain what it is I meant:  here  
> is a sample paper on phase transitions in solving problems like 3-sat:
>
> http://www.sciencemag.org/cgi/content/abstract/264/5163/1297
>
> Props to Chris!
>
> Miles
>
> On 04/02/2008, Miles Osborne <miles at inf.ed.ac.uk> wrote:
> I must confess, the idea that a corpus can be described in terms of  
> "parseability" sounds a little ill-founded to me.  The choice of  
> particular parsing algorithm may dictate which examples are hard to  
> process, as will the underlying grammar etc etc.
>
> What would be interesting (read:  hard) would be to look at the  
> work on phase transitions in 3-sat problems and the like.  So, are  
> there underlying graph-related characteristics of parsing which  
> make certain sentences intrinsically hard to process and in  
> particular can these characteristics be framed in a manner that was  
> independent of the actual parser.
>
> Miles
>
> -- 
> The University of Edinburgh is a charitable body, registered in  
> Scotland, with registration number SC005336.
>
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in  
> Scotland, with registration number SC005336.
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

Sandra Kuebler
Indiana University
Department of Linguistics
Memorial Hall 322
1021 E. Third Street
Bloomington IN 47405
USA
phone: (812) 855-3268
fax: (812) 855-5363
email: skuebler at indiana.edu



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080204/af271b91/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list