[Corpora-List] Metrics for corpus "parseability"

Miles Osborne miles at inf.ed.ac.uk
Mon Feb 4 18:37:28 UTC 2008


I must confess, the idea that a corpus can be described in terms of
"parseability" sounds a little ill-founded to me.  The choice of particular
parsing algorithm may dictate which examples are hard to process, as will
the underlying grammar etc etc.

What would be interesting (read:  hard) would be to look at the work on
phase transitions in 3-sat problems and the like.  So, are there underlying
graph-related characteristics of parsing which make certain sentences
intrinsically hard to process and in particular can these characteristics be
framed in a manner that was independent of the actual parser.

Miles

-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080204/263a9371/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list