[Corpora-List] Incidence of MWEs

Chris Butler cbutler at telefonica.net
Thu Mar 16 08:41:19 UTC 2006


I notice that recent postings on this topic are concerned largely with the
matter of opacity of meaning in MWEs - Robert Amsler's working principle "if
you can predict its meaning from its constituent parts, it
doesn't need a separate entry" effectively equates MWE with the traditional
idiom. But when corpus linguists talk about MWEs (or indeed any of the other
terms which have been used in the literature on this area) they don't just
mean idioms in this sense. Rather, in much of this work, any sequence of
words which (sometimes with internal variation) is frequently repeated in a
body of text (and in that word 'frequently' lurks a lot of trouble!) is,
according to Sinclair's 'idiom principle', assumed to be a unit which is
stored and processed as a whole rather than in terms of its separate
components. Most of these sequences are not idioms in the traditional sense,
but are semantically and syntactically transparent. Note that work based on
frequency will miss quite a lot of traditional idioms, since some of these
are quite rare in text. When Alison Wray writes about 'formulaic sequences'
from a more psycholinguistic rather than a corpus-oriented viewpoint, she
again means a sequence of items which is stored, formulated and retrieved
holistically, whether transparent or not. I think this is worth pointing out
because if anyone interested in the frequency of MWEs in text reads the
corpus literature they are likely to find frequencies cited which are quite
different from those you would get if you only counted sequences whose
meaning is not predictable from those of their components (even if you could
identify such sequences reliably).

Chris Butler
Honorary Professor, Centre for Applied Language Studies, University of Wales
Swansea



More information about the Corpora mailing list