[Corpora-List] Incidence of MWEs

Adam Kilgarriff adam at lexmasterclass.com
Tue Mar 14 12:28:43 UTC 2006


> I was wondering if anyone has estimated the incidence of multi-word 
> expressions in language. 

Wonderful, enormous, bottomless question!

I heard an account of the 'phraseology' symposium in Leeds Uni in 1994 where
the level of interest and enthusiasm in the topic was such that, at the
beginning of the event, people were arguing heatedly about 30% of the
language being phraseological... by the end it had risen to 70!

The answer must be a function of * what you count, * what you count as the
language, and * what you count as an MWE, in particular:

*	are you counting types or tokens?  (Exercise: what is the proportion
of multiwords in the mini-corpus comprising the single sentence, "Apple pie
is apple pie." )
*	what sublanguages do you include - all, some, none? ("mid off" is a
MWE for anyone who knows cricket but not for anyone who doesn't) 
*	how much variation (morphological, syntactic, lexical, modifiers)
can there be, with it still being the same MWE (or, an MWE at all)
(Rosamund Moon's example, are "shake in one's shoes", "quake in one's boots"
and "quake in one's Doc Marten's" all the same MWE?)
*	is non-compositionality a part of the definition?
*	are frequencies or statistics part of the definition? (Theorists
might not want them to be, but without statistics and thresholds, you won't
be able to compute a useful answer, and if you do use them, the answer you
get will depend critically on which statistics and which thresholds you use
so you had better make principled decisions about them)

There is one view of language in which the 'standard case' is meaning of
sentences built from meaning of words, with MWEs being an important kind of
special case.  There is another (specially associated with Birmingham) which
looks at things the other way round: language usually comes in larger
chunks, and "free variation" of words is the special case.  I quite like the
latter.

Adam Kilgarriff
http://www.kilgarriff.co.uk 

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of David Brooks
Sent: 14 March 2006 11:43
To: Corpora List
Subject: [Corpora-List] Incidence of MWEs

Dear Corpora-folk,

I was wondering if anyone has estimated the incidence of multi-word 
expressions in language. I know that empirical estimates are tied to 
particular corpora, but does anyone have an account of MWEs for 
particular corpora, so that "ball-park" figures of the proportion of 
MWEs can be estimated?

Better yet, can anyone give me a good reference for the incidence of MWEs?

Regards,
David
-- 
David Brooks
http://www.cs.bham.ac.uk/~djb



More information about the Corpora mailing list