<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div> ****************************************************</div><div> <b>Manually Annotated Sub-Corpus (MASC)</b></div><div><b> Release Candidate Version</b></div><div> <a href="http://www.anc.org/MASC/download/MASC-3.0.0-RC1.tgz">www.anc.org/MASC/download/MASC-3.0.0-RC1.tgz</a> (.zip)</div><div> *****************************************************</div><div><br></div><div><b>All Open ANC and MASC data and annotations are freely downloadable for any use </b></div><div><b> (including commercial).</b></div><div><br></div>The American National Corpus project has produced a "release candidate" of the full 500K <div>Manually Annotated Sub-Corpus (MASC), which is available for download from the ANC site</div><div>(<a href="http://www.anc.org/download/MASC-3.0.0-RC1.tgz">www.anc.org/download/MASC-3.0.0-RC1.tgz</a> or .zip). The final release, which will include</div><div>full documentation and enhanced tool support, will be available by mid-November. The final </div><div>release will also be freely distributed through the Linguistic Data Consortium.<b> </b></div><div><br></div><div>The release candidate includes the 82K MASC I, released in 2010, which is fully documented at </div><div><a href="http://www.anc.org/MASC">www.anc.org/MASC</a>. The full MASC includes a 500K balanced set of nineteen genres of written </div><div>and spoken American English data annotated for logical structure (paragraph, headings, etc.), token </div><div>and sentence boundaries, part of speech and lemma, shallow parse (noun and verb chunks), and </div><div>named entities (person, organization, location, date). Portions of the corpus are also annotated for </div><div>FrameNet frames (40K full text), Penn Treebank syntax (82K), and Opinion (50K). All annotations </div><div>are either manually produced or hand-validated, and represented in ISO-GrAF standoff format.</div><div><br></div><div>The MASC I Sentence Corpus containing WordNet 3.1 sense annotations of 1000 occurrences for 50 </div><div>words, accompanied by inter-annotator agreement measures, is available for download from the MASC </div><div>site. The complete Sentence Corpus, including annotations of 1000 occurrences for 114 words and </div><div>complementary annotation of 100 sentences per word for FrameNet frames will be available by </div><div>the end of the year.</div><div><br></div><div>Co-reference annotation of the full MASC will also be added by the end of the year. Penn Treebank </div><div>syntax for the remaining 418K of the corpus will be available in late spring, 2013. Currently, PropBank </div><div>annotations of 50K of the corpus are available in their original format. TimeML annotations of the same </div><div>50K are near completion. Both PropBank and TimeML annotations will be made available in ISO-GrAF</div><div>format. </div><div><br></div><div>MultiMASC</div><div>************</div><div>We are currently seeking community members who will develop open corpora in their own languages </div><div>that are comparable to MASC in composition and ultimately, annotations. Please see <span style="background-color: rgb(255, 255, 255); ">Ide, N. (2012). </span></div><div><a href="http://www.cs.vassar.edu/~ide/papers/comparative.pdf" style="background-color: rgb(255, 255, 255); ">MultiMASC: An Open Linguistic Infrastructure for Language Research</a><span style="background-color: rgb(255, 255, 255); ">. </span><i style="background-color: rgb(255, 255, 255); ">Proceedings of the Fifth Workshop </i></div><div><i style="background-color: rgb(255, 255, 255); ">on Building and Using Comparable Corpora</i><span style="background-color: rgb(255, 255, 255); ">. Contact <a href="mailto:anc@anc.org">anc@anc.org</a> if you are interested </span><span style="background-color: rgb(255, 255, 255); ">in contributing to </span></div><div><span style="background-color: rgb(255, 255, 255); ">MultiMASC.</span> </div><div><br></div><div><div>******************************************************************************************************</div><div>MASC is a <b>collaborative community effort </b>and we welcome contributions of annotations in any</div><div>format and/or data, as well as feedback on the resource.</div><div>****************************************************************************************************** </div></div><div><br></div><div><b>The American National Corpus Project</b></div><div><b>Department of Computer Science, Vassar College, New York, USA</b></div><div><b>email: <a href="mailto:anc@anc.org">anc@anc.org</a> • </b><b>web: <a href="http://www.anc.org">www.anc.org</a></b></div><div><br></div><div><br></div><div><br></div></body></html>