Corpora: New Corpora from LDC

LDC Office ldc at unagi.cis.upenn.edu
Mon Jun 19 21:18:30 UTC 2000


The Linguistic Data Consortium is pleased to announce 2 new
corpora.

HONG KONG LAWS PARALLEL TEXT
http://morph.ldc.upenn.edu/Catalog/LDC2000T47.html

This corpus was collected during January 1999 from
http://www.justice.gov.hk, the bilingual website of the Department
of Justice of the Hong Kong Special Administrative Region (HKSAR)
of the People's Republic of China. The corpus, available from the
LDC via FTP, consists of 313,659 parallel sentences in Chinese and
English, which have been processed and sentence aligned.

HONG KONG NEWS PARALLEL TEXT
http://morph.ldc.upenn.edu/Catalog/LDC2000T46.html

This FTP publication was created when the LDC collected parallel
Chinese-English news articles from the Information Services
Department of Hong Kong Special Administrative Region (HKSAR) of
the People's Republic of China.  The collection contains 18,147
aligned article pairs released by HKSAR from 1 July 1997 through
30 April 2000. Automatic article alignment was done at the LDC.

Because of restrictions imposed by the copyright holders, these
corpora are available to 2000 LDC members only.  If you would like
to order a copy of these corpora, please email your request to
<ldc at unagi.cis.upenn.edu>. If you need additional information
before placing your order, or would like to inquire about
membership in the LDC, please send email or call (215) 573-1275.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL:
http://www.ldc.upenn.edu/



More information about the Corpora mailing list