[Corpora-List] Phrase similarity/relatedness dataset
Diana Inkpen
diana at site.uottawa.ca
Tue Oct 11 16:37:08 UTC 2011
Hi Muhammad,
There area few data sets for sentence similarity:
Li et a. 2006 http://semanticsimilarity.net/benchmark-datasets/
Lee, M.D., Pincombe, B.M., & Welsh, M.B. (2005). An empirical evaluation
of models of text document similarity. In B.G. Bara, L.W. Barsalou & M.
Bucciarelli, (Eds.), Proceedings of the 27th Annual Conference of the
Cognitive Science Society, pp. 1254-1259. Mahwah, NJ: Erlbaum.
http://www.socsci.uci.edu/~mdlee/ButaviciusLee2007.pdf
Microsoft paraphrase corpus
http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/
I hope this helps,
Diana
On 04/10/2011 7:21 AM, Muhammad Muhammad wrote:
> Hi
>
> I have worked towards compiling -from books of Quran commentary- a dataset of around 8,000 pairs of Quranic verses that are somehow related. In the course of evaluating this dataset I want to compare this with similar datasets where phrase pairs are tagged related by human judge. From my investigation most works are small in size and deals mostly with pair of words rather than phrases/sentences.
>
> Any help?
>
> Abdul-Baquee M. Sharaf
> PhD Student
> Language Technologies Group
> School of Computing
> University of Leeds
> UK
> _______________________________________________
> UNSUBSCRIBE from this page:http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
====================================================
Diana Inkpen
Associate Professor, PhD, PEng
University of Ottawa
School of Electrical Engineering and Computer Science
800 King Edward, Ottawa, ON, Canada, K1N 6N5
http://www.site.uottawa.ca/~diana
tel: 613-562-5800 ext. 6711
====================================================
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list