<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.499999046325684px;background-color:rgb(255,255,255)">Nancy, I've read your paper about MultiMASC. Very interesting!</span><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.499999046325684px;background-color:rgb(255,255,255)">
<br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.499999046325684px;background-color:rgb(255,255,255)">I wonder if MASC contains among other genres also a dialogue genre.</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.499999046325684px;background-color:rgb(255,255,255)">
<br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.499999046325684px;background-color:rgb(255,255,255)">I've also seen in FAQ that ANC contains demographic information as age, gender, national origin, and race. Can you point to any studies in this field?</div>
<br><div class="gmail_quote">2012/10/15 Nancy Ide <span dir="ltr"><<a href="mailto:ide@cs.vassar.edu" target="_blank">ide@cs.vassar.edu</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div> ****************************************************</div><div> <b>Manually Annotated Sub-Corpus (MASC)</b></div><div><b> Release Candidate Version</b></div>
<div> <a href="http://www.anc.org/MASC/download/MASC-3.0.0-RC1.tgz" target="_blank">www.anc.org/MASC/download/MASC-3.0.0-RC1.tgz</a> (.zip)</div><div> *****************************************************</div>
<div><br></div><div><b>All Open ANC and MASC data and annotations are freely downloadable for any use </b></div><div><b> (including commercial).</b></div><div><br></div>The American National Corpus project has produced a "release candidate" of the full 500K <div>
Manually Annotated Sub-Corpus (MASC), which is available for download from the ANC site</div><div>(<a href="http://www.anc.org/download/MASC-3.0.0-RC1.tgz" target="_blank">www.anc.org/download/MASC-3.0.0-RC1.tgz</a> or .zip). The final release, which will include</div>
<div>full documentation and enhanced tool support, will be available by mid-November. The final </div><div>release will also be freely distributed through the Linguistic Data Consortium.<b> </b></div><div><br></div><div>The release candidate includes the 82K MASC I, released in 2010, which is fully documented at </div>
<div><a href="http://www.anc.org/MASC" target="_blank">www.anc.org/MASC</a>. The full MASC includes a 500K balanced set of nineteen genres of written </div><div>and spoken American English data annotated for logical structure (paragraph, headings, etc.), token </div>
<div>and sentence boundaries, part of speech and lemma, shallow parse (noun and verb chunks), and </div><div>named entities (person, organization, location, date). Portions of the corpus are also annotated for </div><div>
FrameNet frames (40K full text), Penn Treebank syntax (82K), and Opinion (50K). All annotations </div><div>are either manually produced or hand-validated, and represented in ISO-GrAF standoff format.</div><div><br></div><div>
The MASC I Sentence Corpus containing WordNet 3.1 sense annotations of 1000 occurrences for 50 </div><div>words, accompanied by inter-annotator agreement measures, is available for download from the MASC </div><div>site. The complete Sentence Corpus, including annotations of 1000 occurrences for 114 words and </div>
<div>complementary annotation of 100 sentences per word for FrameNet frames will be available by </div><div>the end of the year.</div><div><br></div><div>Co-reference annotation of the full MASC will also be added by the end of the year. Penn Treebank </div>
<div>syntax for the remaining 418K of the corpus will be available in late spring, 2013. Currently, PropBank </div><div>annotations of 50K of the corpus are available in their original format. TimeML annotations of the same </div>
<div>50K are near completion. Both PropBank and TimeML annotations will be made available in ISO-GrAF</div><div>format. </div><div><br></div><div>MultiMASC</div><div>************</div><div>We are currently seeking community members who will develop open corpora in their own languages </div>
<div>that are comparable to MASC in composition and ultimately, annotations. Please see <span style>Ide, N. (2012). </span></div><div><a href="http://www.cs.vassar.edu/~ide/papers/comparative.pdf" style target="_blank">MultiMASC: An Open Linguistic Infrastructure for Language Research</a><span style>. </span><i style>Proceedings of the Fifth Workshop </i></div>
<div><i style>on Building and Using Comparable Corpora</i><span style>. Contact <a href="mailto:anc@anc.org" target="_blank">anc@anc.org</a> if you are interested </span><span style>in contributing to </span></div><div><span style>MultiMASC.</span> </div>
<div><br></div><div><div>******************************************************************************************************</div><div>MASC is a <b>collaborative community effort </b>and we welcome contributions of annotations in any</div>
<div>format and/or data, as well as feedback on the resource.</div><div>****************************************************************************************************** </div></div><div><br></div><div><b>The American National Corpus Project</b></div>
<div><b>Department of Computer Science, Vassar College, New York, USA</b></div><div><b>email: <a href="mailto:anc@anc.org" target="_blank">anc@anc.org</a> • </b><b>web: <a href="http://www.anc.org" target="_blank">www.anc.org</a></b></div>
<div><br></div><div><br></div><div><br></div></div><br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br>