<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<tt>The Preposition Project now has three corpora available for use
in studying preposition behavior. These are </tt><tt>(1) the
training and test sets used in the SemEval-2007 task on
preposition disambiguation, drawn from FrameNet (FN), (2) a set of
sentences from the Oxford English Corpus (OEC) as examples for
senses in the Oxford Dictionary of English (ODE), and (3) a set of
sentences from the written portion of the British National Corpus,
drawn with methodology used in the Corpus Pattern Analysis project
(CPA). The first corpus covers 34 prepositions, while the latter
two include all single-word prepositions and many phrasal
prepositions. Each corpus consists of sentences following the
SemEval format. In addition, each sentence has been lemmatized,
part-of-speech tagged, and parsed with a dependency parser. These
corpora contain over 80,000 sentences.<br>
<br>
These corpora can be downloaded in one zipped file from CL
Research (<a href="http://www.clres.com">http://www.clres.com</a>)
by following the links, particularly at <a
href="http://www.clres.com/elec_dictionaries.html#tppcorp">http://www.clres.com/elec_dictionaries.html#tppcorp</a>.
A paper describing how the corpora were constructed and serving as
the reference is also available (<a
href="http://www.clres.com/online-papers/TPPCorpora.pdf">The Preposition
Project Corpora</a>).<br>
<br>
Ken Litkowski<br>
</tt>
<pre class="moz-signature" cols="72">--
Ken Litkowski TEL.: 301-482-0237
CL Research EMAIL: <a class="moz-txt-link-abbreviated" href="mailto:ken@clres.com">ken@clres.com</a>
9208 Gue Road Home Page: <a class="moz-txt-link-freetext" href="http://www.clres.com">http://www.clres.com</a>
Damascus, MD 20872-1025 USA Blog: <a class="moz-txt-link-freetext" href="http://www.clres.com/blog">http://www.clres.com/blog</a>
</pre>
</body>
</html>