Corpora: New LDC Corpora

LDC Office ldc at ldc.upenn.edu
Fri Feb 22 16:54:36 UTC 2002


   	      *     RST Discourse Treebank      *

	*     Multiple-Translation Chinese Corpus      *


The Linguistic Data Consortium (LDC) is pleased to announce the
availability of the RST Discourse Treebank.  This ftp publication
has been authored by Lynn Carlson, Daniel Marcu, and Mary Ellen
Okurowski. It contains a selection of 385 Wall Street Journal articles
from the Penn
Treebank which have been annotated with discourse structure in the
framework of Rhetorical Structure Theory (RST).  Additionally, the
corpus includes a number of human generated extracts and abstracts
associated with the original documents.


For further information, including a link to the discourse annotation
tool used for this database, please visit:


http://www.ldc.upenn.edu/Catalog/LDC2002T07.html


Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $100.


			    *


The Linguistic Data Consortium (LDC) would like to announce the
availability of the Multiple-Translation Chinese Corpus.  This ftp
publication was designed to support the development of automatic means
for evaluating translation quality.  The corpus consists of 105 stories
drawn from Mandarin Chinese journalistic text.  These stories were
translated several times into English by both human translators and MT
systems.


For further information, including a Chinese text with a sample English
translation, please visit:


http://www.ldc.upenn.edu/Catalog/LDC2002T01.html


Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge.
Nonmembers may purchase this publication for $400.


			   *


If you need additional information before placing your order, or
would like to inquire about membership in the LDC, please send email to
<ldc at ldc.upenn.edu> or call (215) 573-1275.


--------------------------------------------------------------------
Linguistic Data Consortium          Phone: (215) 573-1275
3615 Market Street                  Fax:   (215) 573-2175
Suite 200                           email: ldc at unagi.cis.upenn.edu
Philadelphia, PA 19104-2608         www: http://www.ldc.upenn.edu



More information about the Corpora mailing list