[Corpora-List] Newspaper Corpora
Jan Strunk
strunk at linguistics.ruhr-uni-bochum.de
Mon Apr 14 14:16:11 UTC 2003
Hello,
I would like to evaluate a sentence boundary
and abbreviation detection algorithm on as
many different languages as possible.
Therefore, I am searching for newspaper corpora
that are either freely avaible or not too expensive.
The languages in question should use the period
as an ambiguous token denoting either a sentence
boundary, an abbreviation or both.
I am already using parts of the Wall Street Journal Corpus,
the Neue Zürcher Zeitung and some corpora
included in the Multilingual Corpus I from the European Corpus Initiative.
I also know about TRACTOR.
I would be very thankful for any suggestions.
Best regards,
Jan Strunk
strunk at linguistics.ruhr-uni-bochum.de
Sprachwissenschaftliches Institut
Ruhr-Universität Bochum
Germany
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030414/7722b2aa/attachment.htm>
More information about the Corpora
mailing list