Corpora: Locating sources of corpora

Christopher Cieri ccieri at
Thu Jul 27 17:51:11 UTC 2000


In case you have not already done this, you might have a look at LDC's
Catalog ( We have 168 corpora
available at the moment and add about 20 per year. Most of our English
text corpora focus on news since news text is relatively easy to acquire
in large volume and covers a variety of topics. LDC also does data
collection and annotation for specific projects or sponsors provided
that we retain the right to share the data with our research

Best wishes,
Christopher Cieri
Executive Director, Linguistic Data Consortium
3615 Market Street, Philadelphia, PA 19104-2608 USA
phone: 215-573-5489, fax: 215-573-2175
mailto:Christopher.Cieri at

Sam Chiles wrote:

> Hello all I am new to the world of Corpora and have recently been
> recruited to locate sources of Corpora for a new library in
> development by Microsoft. They are currently licensing English
> language text data covering any subject to use for linguistic
> software, such as grammar checkers. Could anyone give me a few
> pointers toward any type of corpora that could be available for use by
> Microsoft? Thank youSam    Sam Chiles
> E-mail sam.chiles at

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ccieri.vcf
Type: text/x-vcard
Size: 321 bytes
Desc: Card for Christopher Cieri
URL: <>

More information about the Corpora mailing list