[Corpora-List] Incidence of MWEs

David Brooks D.J.Brooks at cs.bham.ac.uk
Thu Mar 16 10:03:52 UTC 2006


Chris Butler wrote:
> I notice that recent postings on this topic are concerned largely with the
> matter of opacity of meaning in MWEs - Robert Amsler's working principle "if
> you can predict its meaning from its constituent parts, it
> doesn't need a separate entry" effectively equates MWE with the traditional
> idiom.

Yes, and I fear this is my fault. I realise there is some difference of 
opinion on many of the matters discussed so far, and I perhaps should 
have narrowed this topic down to something more manageable.

My interest being in syntax, I'm interested in the implications of MWE 
for evaluating parsers. That is to say, if you get something like "light 
pen" in a corpus, it may be tagged as an N-bar, with either a compound 
<N N> or an <Adj N>, but in principle the *syntax* will remain the same 
(tag differences aside).

I would imagine this is not the case for "of course", which doesn't 
strike me as a natural prepositional-phrase; likewise "kick the bucket" 
is /syntactically/ a transitive verb-phrase, but, and here is the core 
of my original (underspecified) question, would it be tagged as a 
transitive verb-phrase, or would it be tagged as an MWE - perhaps an 
intransitive verb-like MWE?

The reason I ask is that for things like PARSEVAL, this is going to have 
an impact on constituent bracket scores, and I was wondering to what 
extent it had been investigated, and how noticeable the effect of MWEs 
might be.

So, I guess I'm principally interested in MWEs that cause a syntactic 
variation (from the compositional norm), and whether or not they are 
tagged in treebanks. Still it's been quite an enlightening debate...

D
-- 
David Brooks
http://www.cs.bham.ac.uk/~djb



More information about the Corpora mailing list