Arnold M. Zwicky
zwicky at CSLI.STANFORD.EDU
Tue Nov 8 19:01:07 UTC 2005
On Nov 7, 2005, at 9:54 PM, Geoff Nunberg wrote:
> I've found that, as Arnold suggests, you can use a collection of
> high-frequency, high-diffusion items as a proxy for database size --
> items like "among," "fix," "book," "behalf" and so forth.
the ones tom wasow ended up using were:
said, other, make, like, time, look, write, see, go, number, way,
meanwhile, from grant barrett:
>> Arnold, you are, of course, right. In this case, though, couldn't all
>> uses of the word "gravitas" count towards the total count, whether
>> they are multiple quotations of a single usage of the word, multiple
>> iterations of the same wire stories in different publications, or
>> discussions of the word itself? The reporter's premise as he put it
>> to me was on the phone that the word *seemed* to be more common...
(james smith has now made a similar point.)
interesting question here. there are really two kinds of frequency
here. one is raw frequency, and you can argue that ordinary people's
perceptions of the frequency of an item are affected by all
occurrences of the item that they hear/read. the other is the
frequency with which the item is actually used (rather than mentioned
or quoted) -- the kind of frequency that *linguists* are interested in.
these frequencies could differ quite a bit in particular cases.
consider nominative coordinate objects ("between you and i"). pretty
much everybody thinks these are common, but in fact, as far as i
know, everybody who's looked at corpora of informal speech and
writing finds nco's to have only a modest frequency -- linguists'
frequency, frequency of use -- so modest, in fact, that it's not easy
to do statistics. in writing about the Frequency Illusion on
Language Log, i attributed most of this disparity between beliefs and
use data to a selective attention effect, and i'm convinced this does
play an important role. but it's also true that nco's are mentioned
in practically every discussion of current english usage; james
cochrane entitled his recent collection of language complaints
Between You and I, and lists of linguistic pet peeves (which are
generated in the most unlikely places, like newsgroups for lgbt folk,
mailing lists for mothers, otherwise focused on child care, and blogs
about the vicissitudes of adolescent life) harrow the nco ground
afresh every few months, thus boosting the raw frequency of nco's, as
well as making them noticeable.
meanwhile, the press attention given to cochrane's book, with the
concomitant discussion in blogs, newsgroups, and mailing lists, has
undoubtedly given a hefty boost to the raw frequency of "between you
and i" (130,000 raw google webhits, very few of which are actual
uses). for an entertaining comparison, check out "eats shoots", an
expression whose frequency of use surely was very low until recently,
but which now gets 406,000 raw webhits, thanks almost entirely to
so when we talk about frequencies, i suppose we need to ask what our
purposes are. and maybe we should try to do things both ways.
there is then the question of how we talk about these things to
More information about the Ads-l