[Corpora-List] New LDC Corpora

Joel Tetreault tetreaul at cs.rochester.edu
Tue Sep 16 15:47:11 UTC 2003


hi, we'll take both.  thanks, Joel

On Tue, 16 Sep 2003, ldc at ldc.upenn.edu wrote:

>
>
>                            LDC2003T11
>                     *   ACE-2 Version 1.0   *
>
>                            LDC2003T13
>             *   Message Understanding Conference (MUC) 6   *
>
> The Linguistic Data Consortium (LDC) is pleased to announce the
> availability of two new corpora.
>
>                                *
>
> ACE-2 Version 1.0 supports the Automatic Content Extraction (ACE)
> program whose objective is to develop extraction technology to support
> automatic processing of source language data. This includes
> classification, filtering, and selection based on the language content
> of the source data, i.e., based on the meaning conveyed by the data.
> Thus, the ACE program requires the development of technologies that
> automatically detect and characterize this meaning. The ACE research
> objectives are viewed as the detection and characterization of Entities,
> Relations, and Events.
>
> Annotations for the ACE-2 corpus concern two research tasks: Entity
> Detection and Tracking (EDT) and Relation Detection and Characterization
> (RDC).  ACE-2 contains two sets of data: training and devtest. Each of
> these sets is further divided by source: broadcast news, newspaper, and
> newswire. There are 179,007 words of source data in 519 files.
>
> For further information about this corpus, including a link to online
> documentation and the NIST ACE program site, please visit:
>
> http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T11
>
>
> Institutions that have membership in the LDC during the 2003
> Membership Year will be able to receive this corpus free of charge.
> Nonmembers may license this publication for $500.
>
>
> 	                      *
>
> In the 1990s, the MUC evaluations funded the development of metrics and
> statistical algorithms to support government evaluations of emerging
> information extraction technologies.  The Message Understanding
> Conference (MUC) 6 corpus contains 318 annotated Wall Street Journal
> articles, scoring software, and corresponding documentation used in the
> MUC 6 evaluation. Both the MUC 6 Additional News Text (LDC96T10) corpus
> and the MUC 6 corpus are necessary in order to replicate the evaluation.
>
> All the materials have been published as received from the corpus
> authors.  No quality control has been conducted at the LDC; however, the
> text files have been uncompressed.
>
> For further information, including online documentation and a link to
> the NIST's MUC pages, please visit:
>
> http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T13
>
> Institutions that have membership in the LDC during the 2003
> Membership Year will be able to receive this corpus free of charge.
> Nonmembers may license this publication for US$100.
>
>
> 		              *
>
>
> MUC VI Text Collection (LDC96T10) has been renamed MUC 6 Additional News
> Text.  The new title more accurately reflects the corpus data as it
> consists only of additional training materials for the MUC 6 evaluation.
>
>
>
> If you need additional information before placing your order, or
> would like to inquire about membership in the LDC, please send email to
>  or call (215) 573-1275.
>
>
> ---------------------------------------------------------------------
> Linguistic Data Consortium          Phone: (215) 573-1275
> 3600 Market Street                  Fax:   (215) 573-2175
> Suite 810                           email: ldc at ldc.upenn.edu
> Philadelphia, PA 19104-2653         www: http://www.ldc.upenn.ed
>
>



More information about the Corpora mailing list