[Corpora-List] Summary of responses: relational databases and phrasal verbs
Timothy Baldwin
tbaldwin at csli.stanford.edu
Wed Oct 29 22:31:52 UTC 2003
> Within the past month or so I posted two queries to CORPORA, one dealing
> with creating a tagger using relational databases, and the other with
> the frequency of phrasal verbs in English. I received a number of
> replies, which are found at the following URL:
>
> http://davies-linguistics.byu.edu/responses/corpora1.htm
As a last minute response to the question of phrasal verb frequency, I have
compiled a list of verb particles extracted out of the written portion of the
BNC, along with frequencies and valence counts for each. The list is
downloadable from:
mwe.stanford.edu/resources/
along with a link to a slightly outdated description of the extraction
technique used. The frequencies are almost certainly underestimates, as I was
more interested in recall than precision when I put this data together, and
the valence judgements should be taken with a pinch of salt. By way of note,
this is the data reported in:
Villavicencio, Aline (2003) Verb-Particle Constructions and Lexical Resources,
In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis,
Acquisition and Treatment, Sapporo, Japan.
Baldwin, Timothy, Colin Bannard, Takaaki Tanaka and Dominic Widdows (2003) An
Empirical Model of Multiword Expression Decomposability, In Proceedings of the
ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and
Treatment, Sapporo, Japan, pp. 89-96.
Colin Bannard, Timothy Baldwin and Alex Lascarides (2003) A Statistical
Approach to the Semantics of Verb-Particles, In Proceedings of the ACL-2003
Workshop on Multiword Expressions: Analysis, Acquisition and Treatment,
Sapporo, Japan, pp. 65-72.
I hope this helps,
Tim
More information about the Corpora
mailing list