[Corpora-List] Incidence of MWEs

Mike Maxwell maxwell at ldc.upenn.edu
Sun Mar 19 15:55:01 UTC 2006


Rob Freeman wrote:
> Surely the question is are tags sensible parameters of language in 
> the first place.

I am not sure what you mean by "parameters".  I believe the original
idea about tags, which you will find in textbooks, is that tags
allow us to make approximations to real language, approximations that
are useful in certain kinds of computation.  If we had a complete
understanding of the mental processing underlying language (and
arguably, pragmatics and everything else), and maybe much more computing
power, we wouldn't need tags.  (But I think we would need syntax and
morphology and a host of other things that linguists have traditionally
studied.)  I don't believe most researchers would consider tags to be a 
theoretical construct--they're an engineering construct.

Having said that, tags frequently bear an obvious relationship to parts 
of speech (aka categories, such as noun, verb...) and morphosyntactic 
features (past tense, plural subject...).  And these are linguistic/ 
scientific notations (although one may of course argue how well 
motivated any particular one is).  They allow us to draw generalizations.

> Instead of worrying where MWE's start and stop, let's accept that 
> MWE's cover all of language. All language is an MWE.

Except for this, which isn't an MWE.  And except for your posting, which
isn't an MWE either (at least not one that I've ever seen before).

> Explain MWE's in terms of generalizations over usage and let's start 
> thinking about how we can use these generalizations over usage

Uh, let's see.  Here's a generalization over usage: the MWE "kick the 
bucket" has a distribution much like the MWE "fire off a shot", which 
has a distribution much like the MWE "pick up the pace", etc.  Let's 
make up a label for these MWEs that obey this generalization--I dunno, 
maybe "VP".

Then we notice the generalization that there are lots of variants of 
each of these MWEs where the first word has an 's' (or 'es') on the end, 
or a 'ing' on the end, or a 'd' (or 'ed').  Let's call that word a "V", 
and the things that go on the end "verbal suffixes".  And we may also 
notice the generalization that a 'V' can be immediately followed by a 
'verbal suffix'.

Oh, but those Vs we noticed also take part in other MWEs--and for that
matter, in things that don't look particularly like MWEs, in the sense 
that there's not much repeated at the word level, only at the category 
level.  So we'll call all those VPs, too.

I am reminded of a story I saw more years ago than I care to say, in a
caving newsletter (this was before the days of blogs, which gives you an 
idea of how ancient it was).  The idea was that there was a danger in 
using a single rope (this was for pit caves, where you have a free 
descent): the rope might rub across a rock, and fray.  So it would be 
safer to use two ropes, so that if one broke while you were half way up 
(or down), you'd still have the other.  But if two ropes are twice as 
safe as one, three ropes would add another 50% safety margin.  And so forth.

But of course the chances of multiple ropes failing at the same time is
very small, and a large number of ropes is heavy.  So you could reduce
the diameter (and therefore the weight) of each rope, while still
maintaining an adequate safety margin.

But then, all those cords get difficult to manage--they tangle.  Ah, but
you could overcome that problem by braiding the cords together!

As you've probably guessed, there is a moral to this, which happens to 
be an MWE: what the left hand takes away, the right hand gives back. 
(But I find the non-MWE version far more entertaining.)  Anyway, I 
suspect that an MWE version of language will end up looking an awful lot 
like some existing theories of syntax and morphology--HPSG, maybe.

    Mike Maxwell



More information about the Corpora mailing list