[Lexicog] Percentage of idioms vs single words

Patrick Hanks hanks at BBAW.DE
Wed Feb 4 19:50:20 UTC 2004


As a fellow inspiree of John Sinclair, I'd like to add something to Phillippe Humblé's remarks about multi-word expressions.

MWEs are not well covered in English dictionaries. What Phillippe is doing in an electronic dictionary re MWEs in Pg is very interesting: another assault on reductionism. Hurrah for Phillippe!  English - a "well documented" language as far as its vocabulary is concerned - has tens of thousands of conventional MWEs that have never been documented in any dictionary.  Why not?  "Well," says the dictionary publisher, "Putting them all in the dict. would A) cost too much in lexicographer time; B) make the dictionary unacceptably large; C) be impossible because we don't have data on all of them and new ones are being created all the time; and D) be unnecessary because they are really part of the grammar not the lexicon - an 'electric fire' is just a type of fire ..." 

A) is a good point.  Collecting and defining tens of thousands of MWEs systematically would certainly greatly increase already strained compilation costs, even if one spent only 5 lexicographer minutes on each one. 
B) was a good point before the advent of on-line dictionaries removed space constraints.
C) is true but irrelevant -- the ideal on-line dict. would collect and define the MWEs that ARE available, and allow for adding more as they become available. 
D) is not true, unfortunately.  If it were true, 'forest fire' would be synonymous with 'wood fire'. But both these MWEs (which are not in ordinary dictionaries) have distinctive conventional meanings, which an ideal dictionary would state explicitly.  A forest fire is out there in the forst, and a wood fire is at home in your house (or in a camp, for cooking).

* * *

And something on "authentic examples" vs. "invented examples": 

Phillippe and I agree that the contents of a dictionary are determined at least in part by the target audience. AND that authenticity (i.e. being found in a corpus) alone is not enough. Evidence of conventionality is also needed. 

But I see no conflict between a good product and a good theory. Certainly, bad theories (of which there are plenty, and very popular they are too) can lead to bad products. Complete absence of theory also leads to bad product.  If one presses the editor of a good dictionary about its theoretical foundations, you generally find that there ARE theoretical foundations, even though they may not be (indeed, almost certainly are not) anything like received linguistic dogma and even though the editor may deny having any theoretical basis at all. 

Thinking back to the 70s (a time when I still believed in introspection as a technique for inventing illustrative data)  --- well, if I had a penny for every bad example that my colleagues and I took out of Collins English Dictionary -- a pre-corpus dictionary, despite attempts by the publishers to claim or imply otherwise -- on the grounds that it (the invented example) was unnatural, I'd be a rich man. And still there are unnatural examples that slipped through. 

I am sorry to hear that the "naive authenticity" view re-surfaced on Cobuild 2. It lurked on Cobuild 1. You do indeed need a very large number of examples to be able to select at least one example for each sense of each word if the selected example is to be both natural and well focused (i.e. short).  I think this is because writers do not simply repeat the normal uses of language -- they EXPLOIT them in order to say new and interesting things, or to say old things in new and interesting ways.

In this connection, Judy Kegl has a wonderful story about being urged to look in a corpus to find authentic, natural data... The word they were discussing was BAKE and the corpus that they had available was Associated Press 1980something.  So Judy looked, and the very first example she found was:

"Always vacuum your moose from the snout up, and brush your pheasant with crumbs of freshly BAKED bread, torn not sliced."  (It was a quotation from the New England Journal of Taxidermy or some such publication.)

"THAT's authentic?!   I'm supposed to use THAT as a model of normal English usage?"


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20040204/85b15a33/attachment.htm>


More information about the Lexicography mailing list