<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <div class="moz-text-html" lang="x-western">

      <div align="center"><i>New publications:</i><br>

        <b> </b></div>

      <p class="MsoNormal" style="text-align: center;" align="center"><b>- 

        </b><a> </a><b> </b><a> </a><b>ACE 2005 English SpatialML

          Annotations Version 2</b> <b>  -</b></p>

      <p class="MsoNormal" style="text-align: center;" align="center"><b>- 

        </b><a> </a><a> </a><b>SemEval-2010 Task 1 OntoNotes English:

          Coreference Resolution in Multiple Languages</b> <b>  -</b><br>

      </p>

      <hr width="100%" size="2"><br>

      <p><a name="ace"></a>(1) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011T02">ACE

2005

          English SpatialML Annotations Version 2</a> was developed by

        researchers at <a href="http://www.mitre.org/">The MITRE

          Corporation</a> and applies SpatialML tags to the English

        newswire and broadcast training data annotated for entities,

        relations and events in <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T06">ACE

2005

          Multilingual Training Corpus LDC2006T06</a>. This second

        version eliminates a number of annotation inconsistencies and

        errors identified in <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03">ACE

2005

          English SpatialML Annotations LDC2008T03</a>. In addition, the

        SpatialML annotation schema has been updated from version 2.0 to

        version 3.0.1; the revised annotation guidelines are included in

        this release. </p>

      <p>The ACE (Automatic Content Extraction) program focused on

        developing automatic content extraction technology to support

        automatic processing of human language in text form.,

        specifically, entities, values, temporal expressions, relations

        and events. SpatialML is a mark-up language for representing

        spatial expressions in natural language documents. It is

        intended to emulate earlier progress on time expression such as

        <a href="http://fofoca.mitre.org/">TIMEX2</a>, <a

          href="http://timeml.org/site/index.html">TimeML</a>, and the <a

href="http://www.itl.nist.gov/iad/mig/tests/ace/2005/doc/ace05eval_official_results_20060110.html">2005

ACE

          guidelines</a>.</p>

      <p>SpatialML includes syntax for marking up PLACEs mentioned in

        text and for linking them to data from gazetteers and other

        databases. LINKs are used to express relations between places,

        and RLINKs to capture trajectories for relative locations. To

        the extent possible, SpatialML leverages ISO and other standards

        with the goal of making the scheme compatible with existing and

        future corpora. SpatialML goes beyond these schemes, however, in

        terms of providing a richer markup for natural language that

        includes semantic features and relationships that allow mapping

        to existing resources such as gazetteers. Such markup can be

        useful for disambiguation, integration with mapping services and

        spatial reasoning.</p>

      <p>This corpus contains 210065 total words and 17821 unique words.

        Counts of unique words can be found in doc/ldc_wordcount.csv

        which includes all words that are not part of XML markup (e.g.,

        without tag names, attribute names or values). Unique words are

        counted by comparing case insensitive transformations with

        preceding and trailing punctuation stripped off. "Words"

        consisting solely of punctuation are discarded. </p>

      <p>The principal change in the annotation schema is that "PATH"

        has been generalized to "RLINK" for relative link. At the top

        level, there is now a version attribute on the root SpatialML

        tag to capture which version of SpatialML was used. A number of

        smaller changes have been made to the annotation specification;

        these are listed in Section 2 of the updated guidelines. </p>

      <p class="MsoNormal" style="margin-bottom: 12pt;"><br>

      </p>

      <p class="MsoNormal" style="text-align: center;" align="center">*</p>

      <p><a name="sem"></a>(2) <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2011T01">SemEval-2010

Task

          1 OntoNotes English: Coreference Resolution in Multiple

          Languages</a> is a subset of <a

href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T04">OntoNotes

Release

          2.0 LDC2008T04</a> used in <a

          href="http://stel.ub.edu/semeval2010-coref/home">SemEval-2010

          Task 1</a>, Coreference Resolution in Multiple Languages.

        OntoNotes Release 2.0 consists of roughly 500,000 words of

        English broadcast and newswire data annotated with structural

        information (syntax and predicate argument structure) and

        shallow semantics (word sense linked to an ontology and

        coreference). This SemEval-2010 Task 1 release contains

        approximately 120,000 words extracted from the OntoNotes corpus

        and formatted for the SemEval task.</p>

      <p>SemEval (Semantic Evaluation) is an ongoing series of

        evaluations of computational semantic analysis systems. The goal

        of SemEval-2010 Task 1 was to evaluate and compare automatic

        coreference resolution systems for six languages (Catalan,

        Dutch, English, German, Italian and Spanish) in four evaluation

        settings using four metrics. Further information about Task 1

        can be found on the <a

          href="http://stel.ub.edu/semeval2010-coref/node/7">task

          description website</a>. </p>

      The data is divided into three sets: the development set which

      contains 39 documents, 741 sentences and 17,044 tokens; the

      training set which contains 229 documents, 3,648 sentences and

      79,060 tokens; and the test set  which contains 85 documents,

      1,141 sentences and 24,206 tokens. The complete material for

      training systems is the sum of the development and training sets.

      <br>

      <br>

      SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in

      Multiple Languages is distributed via web download.<br>

      <br>

      This data is available at no charge.  Non-members may request this

      data by completing a copy of the <a

href="http://www.ldc.upenn.edu/Membership/Agreements/licenses/genericlicense.pdf">LDC

User

        Agreement for Non-Members</a>.  The agreement can be faxed +1

      215 573 2175 or scanned and emailed to this address.  <br>

      <hr width="100%" size="2"><br>

      <div align="center">

        <pre class="moz-signature" cols="72">Ilya Ahtaridis

Membership Coordinator

--------------------------------------------------------------------

Linguistic Data Consortium                  Phone: 1 (215) 573-1275

University of Pennsylvania                    Fax: 1 (215) 573-2175

3600 Market St., Suite 810                        <a class="moz-txt-link-abbreviated" href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>

Philadelphia, PA 19104 USA                 <a class="moz-txt-link-freetext" href="http://www.ldc.upenn.edu">http://www.ldc.upenn.edu</a></pre>

      </div>

    </div>

  </body>

</html>