Corpora: question about availability

Nina Silverberg nsilverb at astro.ocis.temple.edu
Fri May 26 14:17:47 UTC 2000


I'm looking for:

FREE corpora of text that is formatted one sentence per line or with clear
end of sentence markers. Could be just about anything, literature, news,
etc. Preferrably not poetry or speech with lots of pauses and
hesitations. (We already have access to the free part of the Penn LDC
database)

AND/OR

FREE programs that pull individual sentences out of bodies of text with an
algorithm that's not simply searching for punctuation (and checking that
they're not abbreviations), etc.

Nina B. Silverberg                              Phone: (215) 707-3090
Center for Cognitive Neuroscience               Fax: (215) 707-7843
Department of Neurology
Temple University School of Medicine
3401 N. Broad St.
Philadelphia, PA 19140



More information about the Corpora mailing list