[Corpora-List] Phrase similarity/relatedness dataset

Diana Inkpen diana at site.uottawa.ca
Tue Oct 11 16:37:08 UTC 2011


Hi Muhammad,

There area  few data sets for sentence similarity:

Li et a. 2006 http://semanticsimilarity.net/benchmark-datasets/

Lee, M.D., Pincombe, B.M., & Welsh, M.B. (2005). An empirical evaluation 
of models of text document similarity. In B.G. Bara, L.W. Barsalou & M. 
Bucciarelli, (Eds.),  Proceedings of the 27th Annual Conference of the 
Cognitive Science Society, pp. 1254-1259. Mahwah, NJ: Erlbaum. 
http://www.socsci.uci.edu/~mdlee/ButaviciusLee2007.pdf

Microsoft paraphrase corpus 
http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/

I hope this helps,
    Diana

On 04/10/2011 7:21 AM, Muhammad Muhammad wrote:
> Hi
>
> I have worked towards compiling -from books of Quran commentary-  a dataset of around 8,000 pairs of Quranic verses that are somehow related. In the course of evaluating this dataset I want to compare this with similar datasets where phrase pairs are tagged related by human judge. From my investigation most works are small in size and deals mostly with pair of words rather than phrases/sentences.
>
> Any help?
>
> Abdul-Baquee M. Sharaf
> PhD Student
> Language Technologies Group
> School of Computing
> University of Leeds
> UK
> _______________________________________________
> UNSUBSCRIBE from this page:http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
====================================================
Diana Inkpen
Associate Professor, PhD, PEng
University of Ottawa
School of Electrical Engineering and Computer Science
800 King Edward, Ottawa, ON, Canada, K1N 6N5
http://www.site.uottawa.ca/~diana
tel: 613-562-5800 ext. 6711
====================================================



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list