<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META content="MSHTML 5.00.3103.1000" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hello,</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>I would like to evaluate a sentence
boundary</FONT></DIV>
<DIV><FONT face=Arial size=2>and abbreviation detection algorithm on
as</FONT></DIV>
<DIV><FONT face=Arial size=2>many different languages as possible.</FONT></DIV>
<DIV><FONT face=Arial size=2>Therefore, I am searching for newspaper
corpora</FONT></DIV>
<DIV><FONT face=Arial size=2>that are either freely avaible or not too
expensive.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>The languages in question should use the
period</FONT></DIV>
<DIV><FONT face=Arial size=2>as an ambiguous token denoting either a
sentence</FONT></DIV>
<DIV><FONT face=Arial size=2>boundary, an abbreviation or both.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>I am already using parts of the Wall Street Journal
Corpus,</FONT></DIV>
<DIV><FONT face=Arial size=2>the Neue Zürcher Zeitung and some
corpora</FONT></DIV>
<DIV><FONT face=Arial size=2>included in the Multilingual Corpus I from the
European Corpus Initiative.</FONT></DIV>
<DIV><FONT face=Arial size=2>I also know about TRACTOR.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>I would be very thankful for any
suggestions.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Best regards,</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>Jan Strunk</FONT></DIV>
<DIV><FONT face=Arial size=2><A
href="mailto:strunk@linguistics.ruhr-uni-bochum.de">strunk@linguistics.ruhr-uni-bochum.de</A></FONT></DIV>
<DIV><FONT face=Arial size=2>Sprachwissenschaftliches Institut</FONT></DIV>
<DIV><FONT face=Arial size=2>Ruhr-Universität Bochum</FONT></DIV>
<DIV><FONT face=Arial size=2>Germany</FONT></DIV>
<DIV> </DIV></BODY></HTML>