<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#ffffff">
<div class="moz-text-html" lang="x-western">
<div align="center"><i>New publications:</i><br>
<b> </b></div>
<p class="MsoNormal" style="text-align: center;" align="center"><b>-
</b><a> </a><b> </b><a> </a><b>ACE 2005 English SpatialML
Annotations Version 2</b> <b> -</b></p>
<p class="MsoNormal" style="text-align: center;" align="center"><b>-
</b><a> </a><a> </a><b>SemEval-2010 Task 1 OntoNotes English:
Coreference Resolution in Multiple Languages</b> <b> -</b><br>
</p>
<hr width="100%" size="2"><br>
<p><a name="ace"></a>(1) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011T02">ACE
2005
English SpatialML Annotations Version 2</a> was developed by
researchers at <a href="http://www.mitre.org/">The MITRE
Corporation</a> and applies SpatialML tags to the English
newswire and broadcast training data annotated for entities,
relations and events in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06">ACE
2005
Multilingual Training Corpus LDC2006T06</a>. This second
version eliminates a number of annotation inconsistencies and
errors identified in <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03">ACE
2005
English SpatialML Annotations LDC2008T03</a>. In addition, the
SpatialML annotation schema has been updated from version 2.0 to
version 3.0.1; the revised annotation guidelines are included in
this release. </p>
<p>The ACE (Automatic Content Extraction) program focused on
developing automatic content extraction technology to support
automatic processing of human language in text form.,
specifically, entities, values, temporal expressions, relations
and events. SpatialML is a mark-up language for representing
spatial expressions in natural language documents. It is
intended to emulate earlier progress on time expression such as
<a href="http://fofoca.mitre.org/">TIMEX2</a>, <a
href="http://timeml.org/site/index.html">TimeML</a>, and the <a
href="http://www.itl.nist.gov/iad/mig/tests/ace/2005/doc/ace05eval_official_results_20060110.html">2005
ACE
guidelines</a>.</p>
<p>SpatialML includes syntax for marking up PLACEs mentioned in
text and for linking them to data from gazetteers and other
databases. LINKs are used to express relations between places,
and RLINKs to capture trajectories for relative locations. To
the extent possible, SpatialML leverages ISO and other standards
with the goal of making the scheme compatible with existing and
future corpora. SpatialML goes beyond these schemes, however, in
terms of providing a richer markup for natural language that
includes semantic features and relationships that allow mapping
to existing resources such as gazetteers. Such markup can be
useful for disambiguation, integration with mapping services and
spatial reasoning.</p>
<p>This corpus contains 210065 total words and 17821 unique words.
Counts of unique words can be found in doc/ldc_wordcount.csv
which includes all words that are not part of XML markup (e.g.,
without tag names, attribute names or values). Unique words are
counted by comparing case insensitive transformations with
preceding and trailing punctuation stripped off. "Words"
consisting solely of punctuation are discarded. </p>
<p>The principal change in the annotation schema is that "PATH"
has been generalized to "RLINK" for relative link. At the top
level, there is now a version attribute on the root SpatialML
tag to capture which version of SpatialML was used. A number of
smaller changes have been made to the annotation specification;
these are listed in Section 2 of the updated guidelines. </p>
<p class="MsoNormal" style="margin-bottom: 12pt;"><br>
</p>
<p class="MsoNormal" style="text-align: center;" align="center">*</p>
<p><a name="sem"></a>(2) <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011T01">SemEval-2010
Task
1 OntoNotes English: Coreference Resolution in Multiple
Languages</a> is a subset of <a
href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T04">OntoNotes
Release
2.0 LDC2008T04</a> used in <a
href="http://stel.ub.edu/semeval2010-coref/home">SemEval-2010
Task 1</a>, Coreference Resolution in Multiple Languages.
OntoNotes Release 2.0 consists of roughly 500,000 words of
English broadcast and newswire data annotated with structural
information (syntax and predicate argument structure) and
shallow semantics (word sense linked to an ontology and
coreference). This SemEval-2010 Task 1 release contains
approximately 120,000 words extracted from the OntoNotes corpus
and formatted for the SemEval task.</p>
<p>SemEval (Semantic Evaluation) is an ongoing series of
evaluations of computational semantic analysis systems. The goal
of SemEval-2010 Task 1 was to evaluate and compare automatic
coreference resolution systems for six languages (Catalan,
Dutch, English, German, Italian and Spanish) in four evaluation
settings using four metrics. Further information about Task 1
can be found on the <a
href="http://stel.ub.edu/semeval2010-coref/node/7">task
description website</a>. </p>
The data is divided into three sets: the development set which
contains 39 documents, 741 sentences and 17,044 tokens; the
training set which contains 229 documents, 3,648 sentences and
79,060 tokens; and the test set which contains 85 documents,
1,141 sentences and 24,206 tokens. The complete material for
training systems is the sum of the development and training sets.
<br>
<br>
SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in
Multiple Languages is distributed via web download.<br>
<br>
This data is available at no charge. Non-members may request this
data by completing a copy of the <a
href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf">LDC
User
Agreement for Non-Members</a>. The agreement can be faxed +1
215 573 2175 or scanned and emailed to this address. <br>
<hr width="100%" size="2"><br>
<div align="center">
<pre class="moz-signature" cols="72">Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>
Philadelphia, PA 19104 USA <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>
</div>
</div>
</body>
</html>