[Corpora-List] 604 million SVO triples from Web text

Partha Pratim Talukdar partha.talukdar at cs.cmu.edu
Wed Dec 18 01:01:58 UTC 2013


Hello,

As part of the Never Ending Language Learning (NELL) project at CMU, we
have extracted about 604 million Subject-Verb-Object (SVO) triples from
dependency parsed version of the ClueWeb 2009
corpus<http://lemurproject.org/clueweb09/>(500 million documents,
dependency parses made available by Chris Re's
group). The triples are available here:

http://rtw.ml.cmu.edu/resources/svo/

Please note that we haven't done any post-processing of the extractions, so
certain amount of noise is likely. Anyways, if you find this resource
useful, please drop me a note.

Happy Holidays,
Partha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131217/398a36af/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list