[Corpora-List] Incidence of MWEs

Andy Roberts andyr at comp.leeds.ac.uk
Fri Mar 17 09:29:46 UTC 2006


On Thu, 16 Mar 2006, Adam Kilgarriff wrote:

> Bob Amsler says:
>
>> I have found published dictionary's judgments as to what constitute MWEs
>> to be both dated and biased against declaring MWEs to exist.
>> ...
>> Take an MWE such as "pencil sharpener". Most dictionaries don't ...
>
> UK dictionaries on my shelf do list "pencil sharpener" (Oxford D of E 98,
> LDOCE 95, Macmillan E D 02).  US ones (Random House 1987, M-W online) don't.
> Moral is clear.
>
> US dictionaries are ***way, way*** behind UK dictionaries in corpus use.  UK
> dictionary publishers lead the world in corpus development and use (with NLP
> lagging behind).  OUP and Longman were prime movers in developing the BNC,
> and OUP is now on the point of launching its billion-word corpus of English.
> Collins-COBUILD was the great pioneer in the 1980s.  Macmillan was first
> user of my very own word sketches (corpus analysis software).
>

I think we should remember that dictionary publishers are often working
to the contraints of traditional paper printing. There is clearly a
contraint in terms of physical space. Therefore, regardless of how many
MWEs the editors know of, there will be an inevitable culling in order
to deliver a 'pickupable' product.

I can't speak for others, but I know that Longman American dictionaries
are corpus driven too. You'll find 'pencil sharpener' in Longman
Advanced American Dictionary (the US equivalent of LDOCE).

Andy Roberts



More information about the Corpora mailing list