[Corpora-List] Incidence of MWEs

Afsaneh Fazly afsaneh at cs.toronto.edu
Fri Mar 17 14:40:36 UTC 2006


This is a very interesting and important question:
whether multiword units such as "kick the bucket", "make an offer",
or "light pen" should be considered as single syntactic units
with no internal structure.

The introduction to the Oxford Dictionary of Current Idiomatic
English (Vol.2, A. P. Cowie, R. Mackin, I. R. McCaig, 1983)
includes a very interesting discussion on the topic.  Although,
the issue as discussed there is seen from a lexicographical
point of view, I see it very well relevant to the original
question.

There is also a very interesting corpus-based study of
so-called "fixed expressions" in English by Rosamund Moon:

  "Fixed Expressions and Idioms in English, A Corpus-based
  Approach", Oxford Studies in Lexicography and Lexicology,
  1998.

The above (and other related) studies provide evidence that
most MWUs undergo lexical and syntactic variation (although
restricted to some extent), and hence must have internal
structure.
This is especially important when working with MWUs that are
comprised of a verb and a noun.  Such MWUs vary a lot in
terms of their degree of compositionality (or better said,
their degree of semantic analyzability) and hence their
degree of lexicosyntactic fixedness.
Many such MWUs (e.g., "kick the bucket", "shoot the breeze")
are to a large extent idiomatic (unanalyzable).  Others
have meanings with metaphorical relations to the literal
meanings of the constituents, and hence are considered more
analyzable, e.g., "pull strings", "push one's luck", etc.

Another very interesting class of such MWUs (with internal
structure) are those often categorized as light verb
constructions (LVCs).  Examples are "give a groan",
"make an offer", "take a walk", etc.  These are considered
semi-compositional, somewhat analyzable, and more lexically
and syntactically flexible than pure idioms.
In fact, one motivation behind using such complex predicates
is argued to be that their internal structure increases their
expressive power, e.g., one can "give a sad groan",
"make an appealing offer", or "take a long walk".

On the other hand, considering such MWUs as units with internal
structure poses another problem, and that is how they are to
be distinguished from similar-on-the-surface combinations.
One reason for making such a distinction is of course their
semantic idiosyncrasy (the interpretation of "shoot the breeze"
is very much different from that of "shoot the bird").
Another reason is that compared to compositional combinations,
MWUs are overall more constrained in terms of lexical and
syntactic variations they undergo, and this information should
be included in their lexical representation.

We have done some work on verb--noun MWUs which might be of
interest to you.
We develop statistical models that draw on such linguistic
characteristics to predict whether a given combination is
idiomatic or metaphorical in the case of LVCs.
We use evidence from lexicogrammatical fixedness of these
MWUs for the purpose.  (Some related publications could be
found here:  www.cs.toronto.edu/~afsaneh/publications.html).

Regards,

Afsaneh Fazly
=============================================================
PhD student, Computational Linguistics Group
University of Toronto
www.cs.toronto.edu/~afsaneh
=============================================================


On Thu, 16 Mar 2006, David Brooks wrote:

> Chris Butler wrote:
> > I notice that recent postings on this topic are concerned largely with the
> > matter of opacity of meaning in MWEs - Robert Amsler's working principle "if
> > you can predict its meaning from its constituent parts, it
> > doesn't need a separate entry" effectively equates MWE with the traditional
> > idiom.
>
> Yes, and I fear this is my fault. I realise there is some difference of
> opinion on many of the matters discussed so far, and I perhaps should
> have narrowed this topic down to something more manageable.
>
> My interest being in syntax, I'm interested in the implications of MWE
> for evaluating parsers. That is to say, if you get something like "light
> pen" in a corpus, it may be tagged as an N-bar, with either a compound
> <N N> or an <Adj N>, but in principle the *syntax* will remain the same
> (tag differences aside).
>
> I would imagine this is not the case for "of course", which doesn't
> strike me as a natural prepositional-phrase; likewise "kick the bucket"
> is /syntactically/ a transitive verb-phrase, but, and here is the core
> of my original (underspecified) question, would it be tagged as a
> transitive verb-phrase, or would it be tagged as an MWE - perhaps an
> intransitive verb-like MWE?
>
> The reason I ask is that for things like PARSEVAL, this is going to have
> an impact on constituent bracket scores, and I was wondering to what
> extent it had been investigated, and how noticeable the effect of MWEs
> might be.
>
> So, I guess I'm principally interested in MWEs that cause a syntactic
> variation (from the compositional norm), and whether or not they are
> tagged in treebanks. Still it's been quite an enlightening debate...
>
> D
> --
> David Brooks
> http://www.cs.bham.ac.uk/~djb
>
>



More information about the Corpora mailing list