[Corpora-List] Phrase similarity/relatedness dataset

Alessandro Lenci alessandro.lenci at ling.unipi.it
Tue Oct 11 17:15:10 UTC 2011


The paper by Jeff Mitchell and & Mirella Lapatta "Composition in
Distributional Models of Semantics", Cognitive Science, 2010 vol. 34 (8)
pp. 1388-1429 also contains a dataset of English phrases (adjective–noun,
noun–noun, and verb–object combinations) rated for semantic similarity.

Best,

--alessandro


> Hi Muhammad,
>
> There area  few data sets for sentence similarity:
>
> Li et a. 2006 http://semanticsimilarity.net/benchmark-datasets/
>
> Lee, M.D., Pincombe, B.M., & Welsh, M.B. (2005). An empirical evaluation
> of models of text document similarity. In B.G. Bara, L.W. Barsalou & M.
> Bucciarelli, (Eds.),  Proceedings of the 27th Annual Conference of the
> Cognitive Science Society, pp. 1254-1259. Mahwah, NJ: Erlbaum.
> http://www.socsci.uci.edu/~mdlee/ButaviciusLee2007.pdf
>
> Microsoft paraphrase corpus
> http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/
>
> I hope this helps,
>     Diana
>
> On 04/10/2011 7:21 AM, Muhammad Muhammad wrote:
>> Hi
>>
>> I have worked towards compiling -from books of Quran commentary-  a
>> dataset of around 8,000 pairs of Quranic verses that are somehow
>> related. In the course of evaluating this dataset I want to compare this
>> with similar datasets where phrase pairs are tagged related by human
>> judge. From my investigation most works are small in size and deals
>> mostly with pair of words rather than phrases/sentences.
>>
>> Any help?
>>
>> Abdul-Baquee M. Sharaf
>> PhD Student
>> Language Technologies Group
>> School of Computing
>> University of Leeds
>> UK
>> _______________________________________________
>> UNSUBSCRIBE from this page:http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
> --
> ====================================================
> Diana Inkpen
> Associate Professor, PhD, PEng
> University of Ottawa
> School of Electrical Engineering and Computer Science
> 800 King Edward, Ottawa, ON, Canada, K1N 6N5
> http://www.site.uottawa.ca/~diana
> tel: 613-562-5800 ext. 6711
> ====================================================
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>


-- 
Alessandro Lenci

Dipartimento di Linguistica "T. Bolelli"
Università di Pisa
Via Santa Maria 36
56126 PISA

tel. +39-050-2215638; +39-0503152837
WWW: http://www.humnet.unipi.it/linguistica/Docenti/Lenci/index.htm
skype: alessandro.lenci


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list