[Corpora-List] The Preposition Project (TPP) and new preposition corpora

Ken Litkowski ken at clres.com
Wed Nov 14 19:07:08 UTC 2012


In my efforts to understand preposition behavior, I have assembled two 
new corpora: (1) 7500 sentences exemplifying each preposition sense in 
TPP (from Oxford, up to 20 each, for 300 preps) and (2) 48,000 sentences 
constituting a representative sample for 272 preps drawn from the BNC, 
with >=250 for 140 preps (these currently not sense-tagged). These 
corpora add to the one of 25,000+ created for the SemEval 2007 prep WSD 
task for the 34 most common preps. The BNC corpus was developed with the 
aid of Patrick Hanks, with an intent of extending his corpus pattern 
analysis for verbs to preps (particularly to develop ontological 
characterizations of prep complements and governors).

Since analysis of these corpora clearly involves a great deal of work, I 
want to make them available to the wider community in the hopes of 
making more rapid progress in characterizing prep behavior. I am trying 
to use the considerable amount of lexicographic work used in TPP, taking 
into account how these data might be linked to FrameNet's frame elements 
(e.g., the FE taxonomy) and to other substantial lexical resources 
(WordNet, VerbNet, and PropBank). I envision the need for appropriate ML 
technologies, dependency parsing, and linguistic insights. It is my hope 
that this work would contribute substantially to research in such NLP 
areas as QA, Summarization, and RTE.

More details are available at my web site on TPP 
<http://www.clres.com/prepositions.html>, the Online TPP 
<http://www.clres.com/cgi-bin/onlineTPP/find_prep.cgi>, next steps for 
TPP <http://www.clres.com/online-papers/NextTPPSteps.pdf>, and corpus 
pattern analysis for preps 
<http://www.clres.com/online-papers/CPAPreps.pdf>. I am working to bring 
this scattered material, along with the corpora, to an easily accessible 
repository. In the meantime, please direct your comments and inquiries 
to me.

     Ken Litkowski

-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road                     Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA       Blog: http://www.clres.com/blog

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121114/c96113d3/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list