LFG systems and LFG computer implementations

Mon Apr 5 19:12:18 UTC 2010

Hi Lori,

As Mark has already pointed out, XLE provides machinery for parse  
ranking based on discriminative log-linear models. For English, these  
have primarily been trained on the WSJ part of the PTB, but we (the NL  
team at Powerset) are now moving to our own annotated data, which are  
produced by means of the LFG Parsebanker from the University of  
Bergen. For German, they have been trained on the TIGER Treebank; you  
can find more details in my thesis, which is attached. I also know  
that the Fuji Xerox team that develops the Japanese ParGram grammar  
has trained such models, but I'm not sure which annotated corpus they  
have been using. Finally, the Norwegians have done initial experiments  
in parse ranking based on relatively small sets of data produced with  
their Parsebanker tool; as far as I know, they have been surprisingly  
successful given the small size of those data sets. Most of the  
remaining ParGram grammars are probably still struggling with the  
coverage needed to parse corpora and with the availability of  
treebanks, but the idea definitely is to ultimately complement those  
symbolic grammars with machine-learned models, too.

Another method that uses training data which may come from treebanks  
is what we call c-structure pruning. It is basically a PCFG-based way  
to reduce the number of c-structures for which you solve the f- 
annotations and thereby speed up the parser and get more full analyses  
due to a reduced number of timeouts. Aoife Cahill, John Maxwell, Tracy  
King, and Paul Meurer have publications on this.

Finally, I tried learning the ranking of the OT marks used in the  
German grammar from TIGER Treebank data at some point - with some but  
not huge success. You can find a paper on that experiment in the LFG  
2005 Proceedings: http://csli-publications.stanford.edu/LFG/10/lfg05forstetal.pdf 
.

Best regards,

Martin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: thesis-forst.pdf
Type: application/pdf
Size: 2262824 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lfg/attachments/20100405/6078ebc1/attachment.pdf>
-------------- next part --------------