[Corpora-List] Newspaper Corpora

Jan Strunk strunk at linguistics.ruhr-uni-bochum.de
Mon Apr 14 14:16:11 UTC 2003


Hello,

I would like to evaluate a sentence boundary
and abbreviation detection algorithm on as
many different languages as possible.
Therefore, I am searching for newspaper corpora
that are either freely avaible or not too expensive.

The languages in question should use the period
as an ambiguous token denoting either a sentence
boundary, an abbreviation or both.

I am already using parts of the Wall Street Journal Corpus,
the Neue Zürcher Zeitung and some corpora
included in the Multilingual Corpus I from the European Corpus Initiative.
I also know about TRACTOR.

I would be very thankful for any suggestions.

Best regards,

Jan Strunk
strunk at linguistics.ruhr-uni-bochum.de
Sprachwissenschaftliches Institut
Ruhr-Universität Bochum
Germany

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030414/7722b2aa/attachment.htm>


More information about the Corpora mailing list