[Corpora-List] Semantic parsing for the robot commands corpus by integrating spatial knowledge

Sat Sep 7 06:55:03 UTC 2013

Dear Corpora List,

Firstly, some good news - The Robot Commands Annotation Corpus (http://www.trainrobots.com) is growing quite well, and we're getting close to 100,000 words thanks to everyone who's playing. However, before releasing the data, I'm keen to do some annotation, starting with automatic parsing.

A couple of things have surprised me with this new experiment for crowdsourcing robot commands online. Firstly, a lot of commands can be long and linguistically rich. Forgetting those for a moment, I think even parsing simple commands to start with can be tricky. This is because many commands that players have been typing in are impossible to parse correctly without spatial context. The following is a real example that a player typed in which only makes sense in the context of the images in the game:

"Place the red block on the yellow block on the blue block in the top left corner."

This could mean any of these possible moves:

1. Move red block (on yellow block on blue block), and put this in top left corner.
2. Move red block (on yellow block), and put this on blue block (in top left corner).
3. Move red block, and put this on yellow block (on blue block in top left corner).

I've been considering several ways to do semantic parsing for the corpus, e.g. CCG parsing, using Stanford dependency parser, etc. My concern with these, as far as I understand, is that they would require a pipeline approach. Making understanding the above sentence quite problematic. However, what it really like to do is to use a parsing algorithm that allows me to incorporate spatial knowledge. I would rather not do a brute-force search of many possible parse trees output by a parser, but would rather do something at parse-time.

After some consideration I’m thinking of writing a custom statistical parser based on Goldberg and Elhadad's non-directional dependency algorithm (http://www.aclweb.org/anthology-new/N/N10/N10-1115.pdf). My intuition here is that a parser using this sort of algorithm will allow me to tune the scoring function to including spatial knowledge and make it easier to perform correct long-distance PP-attachment at parse-time.

In a nutshell, I’m looking for any suggestions on how to do joint parsing / spatial disambiguation. Ideally not a pipeline approach, but something integrated. Surely there must be a good way to use spatial knowledge while parsing? Could my idea of modifying the non-directional dependency parser work? The parser could then leverage a relational knowledge base for each scene, e.g. if the parser knew that the red block is on the yellow block, it could produce the correct parse tree for the above example. Hopefully this same approach would reduce error-propagation when I come to the larger more complex sentences in the corpus.

Feedback or thoughts would be most welcome.

Regards,
-- Kais
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora