[Corpora-List] Incidence of MWEs
Rob Freeman
lists at chaoticlanguage.com
Fri Mar 17 05:39:39 UTC 2006
On Thursday 16 March 2006 23:03, David Brooks wrote:
> ...
> My interest being in syntax, I'm interested in the implications of MWE
> for evaluating parsers. That is to say, if you get something like "light
> pen" in a corpus, it may be tagged as an N-bar, with either a compound
> <N N> or an <Adj N>, but in principle the *syntax* will remain the same
> (tag differences aside).
>
> I would imagine this is not the case for "of course", which doesn't
> strike me as a natural prepositional-phrase; likewise "kick the bucket"
> is /syntactically/ a transitive verb-phrase, but, and here is the core
> of my original (underspecified) question, would it be tagged as a
> transitive verb-phrase, or would it be tagged as an MWE - perhaps an
> intransitive verb-like MWE?
No disrespect intended, David, you are not saying anything different to the
other posts in this thread. It is just your post presents the common
misconception most clearly.
As the old maxim goes, answers are easy, the difficult part is to find the
right questions.
How do we tag MWE's? Surely the question is are tags sensible parameters of
language in the first place.
Yet again tags are causing us problems. Why are we so married to tags?
When will we see the real answer is that the idea of tags for language just
does not fit. Tags require us to imagine there are two kinds of language,
regular (parametrized by tags) and irregular (enumerated in lexicon.) And
then we have a third kind of language, MWE's mysterious because it falls
between the two.
So we posit two distinct models, and then agonize over the mystery that real
language displays instead properties which are properly that of neither.
And the problem with that, we insist, is not that real language (MWE's) has
properties of neither model, but rather that we can't extend our models to
fit real language. As if the goal of linguistics is to fit language to
existing models rather than to find models which explain language.
Why bother with two models which don't work, and a separate (unknown) model of
MWE's which fits neither.
It is much simpler to imagine there is one kind of language, MWE's. Forget
tags, explain MWE's and you no longer have a problem.
And MWE's are easy to explain. You can model them as generalizations over
usage. More frequent generalizations look like lexicon, less frequent
generalizations look like syntax.
Instead of worrying where MWE's start and stop, let's accept that MWE's cover
all of language. All language is an MWE. Explain MWE's in terms of
generalizations over usage and let's start thinking about how we can use
these generalizations over usage, rather than worrying about defining where
MWE's stop or start, and how they should be tagged.
Or we can continue to debate the "problem" of MWE's.
-Rob
More information about the Corpora
mailing list