[Corpora-List] The Preposition Project (TPP) and new preposition corpora
Ken Litkowski
ken at clres.com
Wed Nov 14 19:07:08 UTC 2012
In my efforts to understand preposition behavior, I have assembled two
new corpora: (1) 7500 sentences exemplifying each preposition sense in
TPP (from Oxford, up to 20 each, for 300 preps) and (2) 48,000 sentences
constituting a representative sample for 272 preps drawn from the BNC,
with >=250 for 140 preps (these currently not sense-tagged). These
corpora add to the one of 25,000+ created for the SemEval 2007 prep WSD
task for the 34 most common preps. The BNC corpus was developed with the
aid of Patrick Hanks, with an intent of extending his corpus pattern
analysis for verbs to preps (particularly to develop ontological
characterizations of prep complements and governors).
Since analysis of these corpora clearly involves a great deal of work, I
want to make them available to the wider community in the hopes of
making more rapid progress in characterizing prep behavior. I am trying
to use the considerable amount of lexicographic work used in TPP, taking
into account how these data might be linked to FrameNet's frame elements
(e.g., the FE taxonomy) and to other substantial lexical resources
(WordNet, VerbNet, and PropBank). I envision the need for appropriate ML
technologies, dependency parsing, and linguistic insights. It is my hope
that this work would contribute substantially to research in such NLP
areas as QA, Summarization, and RTE.
More details are available at my web site on TPP
<http://www.clres.com/prepositions.html>, the Online TPP
<http://www.clres.com/cgi-bin/onlineTPP/find_prep.cgi>, next steps for
TPP <http://www.clres.com/online-papers/NextTPPSteps.pdf>, and corpus
pattern analysis for preps
<http://www.clres.com/online-papers/CPAPreps.pdf>. I am working to bring
this scattered material, along with the corpora, to an easily accessible
repository. In the meantime, please direct your comments and inquiries
to me.
Ken Litkowski
--
Ken Litkowski TEL.: 301-482-0237
CL Research EMAIL: ken at clres.com
9208 Gue Road Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121114/c96113d3/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list