[Corpora-List] Preposition corpora
Ken Litkowski
ken at clres.com
Sun Apr 21 21:01:02 UTC 2013
The Preposition Project now has three corpora available for use in
studying preposition behavior. These are (1) the training and test sets
used in the SemEval-2007 task on preposition disambiguation, drawn from
FrameNet (FN), (2) a set of sentences from the Oxford English Corpus
(OEC) as examples for senses in the Oxford Dictionary of English (ODE),
and (3) a set of sentences from the written portion of the British
National Corpus, drawn with methodology used in the Corpus Pattern
Analysis project (CPA). The first corpus covers 34 prepositions, while
the latter two include all single-word prepositions and many phrasal
prepositions. Each corpus consists of sentences following the SemEval
format. In addition, each sentence has been lemmatized, part-of-speech
tagged, and parsed with a dependency parser. These corpora contain over
80,000 sentences.
These corpora can be downloaded in one zipped file from CL Research
(http://www.clres.com) by following the links, particularly at
http://www.clres.com/elec_dictionaries.html#tppcorp. A paper describing
how the corpora were constructed and serving as the reference is also
available (The Preposition Project Corpora
<http://www.clres.com/online-papers/TPPCorpora.pdf>).
Ken Litkowski
--
Ken Litkowski TEL.: 301-482-0237
CL Research EMAIL: ken at clres.com
9208 Gue Road Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130421/64681d1c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list