gbarrett at WORLDNEWYORK.ORG
Mon Nov 7 21:24:50 UTC 2005
Arnold, you are, of course, right. In this case, though, couldn't all
uses of the word "gravitas" count towards the total count, whether
they are multiple quotations of a single usage of the word, multiple
iterations of the same wire stories in different publications, or
discussions of the word itself? The reporter's premise as he put it
to me was on the phone that the word *seemed* to be more common. The
Factiva numbers support that, since all occurrences of the word he
saw or heard would add to his impression. His growing feeling that
the word was more common would not necessarily discern between
repeats, redundancies, or discussions of the word. His mistake was
using those numbers to bolster his argument, which turned a statement
of impression/feeling/opinion into a statement of fact.
We should be able to account for the count problem by matching it
with another common word such as "the," although on LexisNexis
Academic this is impossible because of its stop words (meaning "the"
is automatically skipped in searches). Factiva *does* appear to
permit searching for "the." I think we can account in some small way
for discussions of the word gravitas by eliminating those hits that
include such phrases as "meaning of (the word) gravitas," "definition
of (the word) gravitas," "origin of the word/term gravitas," etc.
Without spending weeks on it and hiring a staff, I see no easy way to
account for multiple quotations of a single usage of the word
("Cheney said, 'Bush has the gravitas necessary to be president.'")
or multiple iterations of the same wire stories in different
I don't have the time to redo the searches using the strategies you
suggest, but my bet is that we'd still see the same spikes in usage
at the time of the presidential elections, and an overall trend for
more usage, especially when compared to 1999 and earlier. Beer at the
ADS meeting in Albuquerque for anyone who does indeed run the numbers.
On Nov 7, 2005, at 15:19, Arnold M. Zwicky wrote:
> On Nov 7, 2005, at 11:12 AM, Grant Barrett wrote:
>> ... The proper way to do such data-gathering would have been to
>> search a
>> set group of newspapers over that same period of time.
> even that isn't enough, unless the number of pages (or words)
> searched remains constant over the search period. one way to fix
> that is to normalize by dividing the raw figures for a period by some
> measure of the number of words searched in that period. in our work
> at stanford on quotative "all" in google groups, we (well tom wasow)
> used a search on the word "the" for this purpose, and then multiplied
> by a constant to get numbers in some reasonable range (ultimately,
> estimates of quotative "all" per 100,000 pages).
> at my suggestion, tom then tried a number of other very common words
> for normalization, and got results almost scarily close to the ones
> for "the". so we think "the" is a pretty good normalizer.
> one of the things we had to do in counting quotative "all"
> occurrences was to remove examples in discussions *about* the word,
> which become fairly frequent when the usage does. i'd expect a
> similar problem with "gravitas" citations. "gravitas" has another
> problem that is unlikely to be significant for quotative "all":
> quotations of previous uses. these two effects would contribute to
> an increase in "gravitas" hits over time (at least for a while), even
> if people were not producing more primary occurrences.
> sampling the data could yield an estimate of these effects for
> "gravitas", if hand-searching turns out to be onerous.
> i suspect that the size of these effects isn't constant across vogue
> words and innovative usages, so they'd have to be estimated for each
More information about the Ads-l