[Corpora-List] Short Text Corpus
Torsten Zesch
zesch at tk.informatik.tu-darmstadt.de
Wed May 26 12:32:55 UTC 2010
Dear Khaled,
> I'm looking for a corpus of short text (e.g. sentences) pairs for
> measuring similarity purpose. Could anyone please suggest me a link of such
> resource.
Here are some pointers to datasets that have been used for that purpose before:
a)
Microsoft Paraphrase Corpus
http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/
b)
Li, Y., McLean, D., Bandar, Z., O'Shea, J., and Crockett, K. (2006). Sentence similarity
based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and
Data Engineering, 18(8):1138-1150.
http://www2.docm.mmu.ac.uk/STAFF/J.Oshea/TRMMUCCA20081_5.pdf
c)
Lee, M. D., Pincombe, B., and Welsh, M. (2005). An empirical evaluation of models
of text document similarity. In Proceedings of the 27th Annual Conference of the
Cognitive Science Society, pages 1254-1259.
Available upon request, as far as I know.
-Torsten
> -----Ursprüngliche Nachricht-----
> Von: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] Im Auftrag von
> KHALED OMAR
> Gesendet: Donnerstag, 20. Mai 2010 10:17
> An: corpora at uib.no
> Betreff: [Corpora-List] Short Text Corpus
>
> Dear all,
>
> I'm looking for a corpus of short text (e.g. sentences) pairs for
> measuring similarity purpose. Could anyone please suggest me a link of such
> resource.
>
>
>
> Thank you so much in advance.
>
>
>
> Khaled
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list