[Corpora-List] Summary of responses: relational databases and phrasal verbs

Timothy Baldwin tbaldwin at csli.stanford.edu
Wed Oct 29 22:31:52 UTC 2003


> Within the past month or so I posted two queries to CORPORA, one dealing
> with creating a tagger using relational databases, and the other with
> the frequency of phrasal verbs in English.  I received a number of
> replies, which are found at the following URL:
>
> http://davies-linguistics.byu.edu/responses/corpora1.htm

As a last minute response to the question of phrasal verb frequency, I have
compiled a list of verb particles extracted out of the written portion of the
BNC, along with frequencies and valence counts for each. The list is
downloadable from:

mwe.stanford.edu/resources/

along with a link to a slightly outdated description of the extraction
technique used. The frequencies are almost certainly underestimates, as I was
more interested in recall than precision when I put this data together, and
the valence judgements should be taken with a pinch of salt. By way of note,
this is the data reported in:

Villavicencio, Aline (2003) Verb-Particle Constructions and Lexical Resources,
In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis,
Acquisition and Treatment, Sapporo, Japan.

Baldwin, Timothy, Colin Bannard, Takaaki Tanaka and Dominic Widdows (2003) An
Empirical Model of Multiword Expression Decomposability, In Proceedings of the
ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and
Treatment, Sapporo, Japan, pp. 89-96.

Colin Bannard, Timothy Baldwin and Alex Lascarides (2003) A Statistical
Approach to the Semantics of Verb-Particles, In Proceedings of the ACL-2003
Workshop on Multiword Expressions: Analysis, Acquisition and Treatment,
Sapporo, Japan, pp. 65-72.


I hope this helps,


Tim



More information about the Corpora mailing list