<span class="Apple-style-span" style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 255, 0.917969); ">Dear list,</span><div style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 255, 0.917969); ">
<br></div><div style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 255, 0.917969); ">I would like to introduce you the recently released Columbia Summarization Corpus</div>
<div style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 255, 0.917969); ">(freely available at: <a href="http://www.cs.columbia.edu/~kathy/Data/CSC.tar.gz" target="_blank" style="color: rgb(17, 85, 204); ">http://www.cs.columbia.edu/~kathy/Data/CSC.tar.gz</a>).</div>
<div style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 255, 0.917969); "><br></div><div style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 255, 0.917969); ">
<div>The Columbia Summarization Corpus (CSC) was retrieved from the output of the Newsblaster online news summarization system (<a href="http://newsblaster.cs.columbia.edu/">http://newsblaster.cs.columbia.edu/</a>) that crawls the Web for news articles, clusters them on specific topics and produces multidocument summaries for each cluster. We collected a total of 166,435 summaries containing 2.5 million sentences and covering 2,129 days in the 2003-2011 period. The CSC corpus can be used, but not limited to the following purposes: </div>
<div><br></div><div>* Event Mining </div><div>* Language generation </div><div>* Summarization </div><div>* Information retrieval </div><div>* Information extraction </div><div>* Sentiment analysis and opinion mining </div>
<div>* Question answering </div><div>* Text mining and natural language processing applications </div><div>* Language modeling for text processing </div><div>* Lexicon and ontology development </div><div>* Machine learning (supervised, semi-supervised, and unsupervised learning) </div>
<div><br></div><div>Citation: </div><div><br></div><div>William Yang Wang, Kapil Thadani, and Kathleen R. McKeown, "Identifying Event Descriptions using Co-training with Online News Summaries", in Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand, Nov. 8-13, ACL-AFNLP. <a href="http://www.cs.cmu.edu/~yww/papers/ijcnlp2011.pdf" target="_blank" style="color: rgb(17, 85, 204); ">http://www.cs.cmu.edu/~yww/papers/ijcnlp2011.pdf</a> Additional references of the Columbia Newsblaster summarizer can be found on the website of Columbia NLP group publication page (<a href="http://www1.cs.columbia.edu/nlp/papers.cgi">http://www1.cs.columbia.edu/nlp/papers.cgi</a>).</div>
<div><br></div><div>If you have any further questions, feel free to let me know.</div><div><br></div><div>Cheers,</div><div>William</div></div><div><br></div>-- <br>William Y. Wang<br>School of Computer Science,<br>Carnegie Mellon University.<br>
<a href="http://www.cs.cmu.edu/~yww/" target="_blank">http://www.cs.cmu.edu/~yww/</a><br>