[Corpora-List] Syntax-based Sentence Similarity measures

Jason Eisner jason at cs.jhu.edu
Sat Nov 22 18:17:03 UTC 2008


2008/11/22 ben dbabis samira <bendbabis_samira at yahoo.fr>:
> I'm working on sentence similarity, I want to know if there are
> measures that calculate the similarity between two sentences using the
> syntactic information (grammatical category, dependencies relations,...) i.e
> : measures that take into account the structure of the whole sentence (not a
> word level measure that considers a sentence as a bag of words)

There have been a number of papers on various tree kernels and path
kernels (easily found by searching).  Each parse tree is mapped to a
high-dimensional vector that records the counts of various
substructures such as complete and incomplete subtrees,
subcategorization frames, and/or dependency paths.  The similarity of
two trees is then defined as the dot product of their vectors.  This
dot product can typically be found efficiently by dynamic programming
over the pair of trees, without having to expand out the actual
high-dimensional vector for each tree.  (An instance of the "kernel
trick.")

Alternatively for an asymmetric measure, see work on quasi-synchronous
grammar, e.g., "What is the Jeopardy Model? A Quasi-Synchronous
Grammar for QA" by Mengqiu Wang, Noah A. Smith, and Teruko Mitamura
(EMNLP 2007).  http://www.cs.cmu.edu/~nasmith/papers/wang+smith+mitamura.emnlp07.pdf

Most of these methods can be extended naturally to work efficiently
over packed forests of parse trees, so that you don't have to commit
to a single parse tree for each sentence.

-cheers, jason

On Sat, Nov 22, 2008 at 6:33 AM, Paul McNamee <paul.mcnamee at jhuapl.edu> wrote:
> Cui et al. had a paper at SIGIR 2005, "Question Answering Passage Retrieval
> Using Dependency Relations":
>       http://doi.acm.org/10.1145/1076034.1076103
>
> They looked for sentences that might contain an answer to a question
> for experiments in question answering at TREC. And I believe some of
> their source code was made publicly available.
>
> You might also find some relevant work from the RTE evaluations:
>   http://www.nist.gov/tac/tracks/2008/rte/
>
> - Paul

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list