[Corpora-List] Incidence of MWEs

Rob Freeman lists at chaoticlanguage.com
Fri Mar 17 05:39:39 UTC 2006


On Thursday 16 March 2006 23:03, David Brooks wrote:
> ...
> My interest being in syntax, I'm interested in the implications of MWE
> for evaluating parsers. That is to say, if you get something like "light
> pen" in a corpus, it may be tagged as an N-bar, with either a compound
> <N N> or an <Adj N>, but in principle the *syntax* will remain the same
> (tag differences aside).
>
> I would imagine this is not the case for "of course", which doesn't
> strike me as a natural prepositional-phrase; likewise "kick the bucket"
> is /syntactically/ a transitive verb-phrase, but, and here is the core
> of my original (underspecified) question, would it be tagged as a
> transitive verb-phrase, or would it be tagged as an MWE - perhaps an
> intransitive verb-like MWE?

No disrespect intended, David, you are not saying anything different to the 
other posts in this thread. It is just your post presents the common 
misconception most clearly.

As the old maxim goes, answers are easy, the difficult part is to find the 
right questions.

How do we tag MWE's? Surely the question is are tags sensible parameters of 
language in the first place.

Yet again tags are causing us problems. Why are we so married to tags?

When will we see the real answer is that the idea of tags for language just 
does not fit. Tags require us to imagine there are two kinds of language, 
regular (parametrized by tags) and irregular (enumerated in lexicon.) And 
then we have a third kind of language, MWE's mysterious because it falls 
between the two.

So we posit two distinct models, and then agonize over the mystery that real 
language displays instead properties which are properly that of neither.

And the problem with that, we insist, is not that real language (MWE's) has 
properties of neither model, but rather that we can't extend our models to 
fit real language. As if the goal of linguistics is to fit language to 
existing models rather than to find models which explain language.

Why bother with two models which don't work, and a separate (unknown) model of 
MWE's which fits neither.

It is much simpler to imagine there is one kind of language, MWE's. Forget 
tags, explain MWE's and you no longer have a problem.

And MWE's are easy to explain. You can model them as generalizations over 
usage. More frequent generalizations look like lexicon, less frequent 
generalizations look like syntax.

Instead of worrying where MWE's start and stop, let's accept that MWE's cover 
all of language. All language is an MWE. Explain MWE's in terms of 
generalizations over usage and let's start thinking about how we can use 
these generalizations over usage, rather than worrying about defining where 
MWE's stop or start, and how they should be tagged.

Or we can continue to debate the "problem" of MWE's.

-Rob



More information about the Corpora mailing list