[Corpora-List] Preposition corpora

Ken Litkowski ken at clres.com
Sun Apr 21 21:01:02 UTC 2013


The Preposition Project now has three corpora available for use in 
studying preposition behavior. These are (1) the training and test sets 
used in the SemEval-2007 task on preposition disambiguation, drawn from 
FrameNet (FN), (2) a set of sentences from the Oxford English Corpus 
(OEC) as examples for senses in the Oxford Dictionary of English (ODE), 
and (3) a set of sentences from the written portion of the British 
National Corpus, drawn with methodology used in the Corpus Pattern 
Analysis project (CPA). The first corpus covers 34 prepositions, while 
the latter two include all single-word prepositions and many phrasal 
prepositions. Each corpus consists of sentences following the SemEval 
format. In addition, each sentence has been lemmatized, part-of-speech 
tagged, and parsed with a dependency parser. These corpora contain over 
80,000 sentences.

These corpora can be downloaded in one zipped file from CL Research 
(http://www.clres.com) by following the links, particularly at 
http://www.clres.com/elec_dictionaries.html#tppcorp. A paper describing 
how the corpora were constructed and serving as the reference is also 
available (The Preposition Project Corpora 
<http://www.clres.com/online-papers/TPPCorpora.pdf>).

     Ken Litkowski

-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road                     Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA       Blog: http://www.clres.com/blog

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130421/64681d1c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list