[Corpora-List] Short Text Corpus

Torsten Zesch zesch at tk.informatik.tu-darmstadt.de
Wed May 26 12:32:55 UTC 2010


Dear Khaled,

>          I'm looking for a corpus of short text (e.g. sentences) pairs for
> measuring similarity purpose. Could anyone please suggest me a link of such
> resource.

Here are some pointers to datasets that have been used for that purpose before:

a)
Microsoft Paraphrase Corpus
http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/

b)
Li, Y., McLean, D., Bandar, Z., O'Shea, J., and Crockett, K. (2006). Sentence similarity
based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and
Data Engineering, 18(8):1138-1150.

http://www2.docm.mmu.ac.uk/STAFF/J.Oshea/TRMMUCCA20081_5.pdf

c)
Lee, M. D., Pincombe, B., and Welsh, M. (2005). An empirical evaluation of models
of text document similarity. In Proceedings of the 27th Annual Conference of the
Cognitive Science Society, pages 1254-1259.

Available upon request, as far as I know.

-Torsten

> -----Ursprüngliche Nachricht-----
> Von: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] Im Auftrag von
> KHALED OMAR
> Gesendet: Donnerstag, 20. Mai 2010 10:17
> An: corpora at uib.no
> Betreff: [Corpora-List] Short Text Corpus
> 
> Dear all,
> 
>          I'm looking for a corpus of short text (e.g. sentences) pairs for
> measuring similarity purpose. Could anyone please suggest me a link of such
> resource.
> 
> 
> 
> Thank you so much in advance.
> 
> 
> 
> Khaled
> 
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list