Bizarre Google search results (was RE: The not-so-slow death of truthiness?)

Lynne Murphy m.l.murphy at SUSSEX.AC.UK
Wed Aug 23 18:20:56 UTC 2006


As I understand it, Google doesn't actually find the number of hits that it
says it's found.  It finds a bunch, then extrapolates how many there are
likely to be based on how quickly it found that bunch.  It never retrieves
more than 1000 (actually, I've never managed to get more than 990, even if
it claims to have found a million hits).

They also regularly change how they do these things and keep it all a
closely guarded trade secret, so it's very hard to do any kind of
methodologically transparent corpus work with Google.  (But that hasn't
kept me from trying!)

Lynne

--On Wednesday, August 23, 2006 1:11 pm -0500 Clai Rice
<cxr1086 at LOUISIANA.EDU> wrote:

> ---------------------- Information from the mail header
> ----------------------- Sender:       American Dialect Society
> <ADS-L at LISTSERV.UGA.EDU>
> Poster:       Clai Rice <cxr1086 at LOUISIANA.EDU>
> Subject:      Bizarre Google search results (was RE: The not-so-slow
> death of               truthiness?)
> -------------------------------------------------------------------------
> ------
>
> Following up on truthiness in google groups searches, I've had a bizarre
> result. From the advanced search page, truthiness -colbert lang=English
> display 50 hits per page by date no date limit, returns 1,290 hits:
>  Results 1 - 50 of 1,290 for truthiness -  colbert (1.98 seconds)
>
> Then I go to the last display page (to see the earliest uses):
>  Results 201 - 211 of 211 for truthiness -  colbert (0.61 seconds)
>
> So are there 211 or 1290 hits? At the bottom of the last page I click on
> "repeat the search with the omitted results included." and get
>  Results 1 - 50 of 1,580 for truthiness -  colbert (0.57 seconds) !
>
> I navigate to the last display page using the Gooooogle bar at the
> bottom, either going directly to number 10 or descending one page at a
> time, and when I arrive at display page 9 I see:
>  Results 401 - 410 of 410 for truthiness -  colbert (1.10 seconds)
>
> So how many results are there, 210, 410, 1290, or 1580? The same search
> has yielded as few as 980 and as many as 1600 hits, all with the same
> display problem--410 is the maximum that are actually displayed.
>
> Does anyone know anything about a usual discrepancy between reported
> result counts and number of displyed hits?
>
> Clai Rice
>
>> -----Original Message-----
>> From: Dave Wilton [mailto:dave at WILTON.NET]
>> Sent: Friday, August 18, 2006 9:01 AM
>> Subject: Re: The not-so-slow death of truthiness?
>>
>>
>> A search of Technorati, which covers blogs, turns up over
>> 8,000 hits for "truthiness," some 4,000 of which appear if
>> also search on "-Colbert". A slang term like "truthiness" is
>> more apt to appear in a non-edited format like Usenet or in
>> blogs than in the edited newspapers and journals of Lexis-Nexis.
>>
>> "Truthiness" is not the most popular of new words, but it
>> clearly has a life of its own. I've been encountering it more
>> and more frequently. I think it will be around for a while.
>> (FWIW, I did not vote for it as WOTY.)
>>
>> --Dave Wilton
>>   dave at wilton.net
>>
>> -----Original Message-----
>> From: American Dialect Society
>> [mailto:ADS-L at LISTSERV.UGA.EDU] On Behalf Of Dave Wilton
>> Sent: Thursday, August 17, 2006 6:52 PM
>> To: ADS-L at LISTSERV.UGA.EDU
>> Subject: Re: The not-so-slow death of truthiness?
>>
>> There is a distinction between "Google" and "Google Groups."
>>
>> You are right, there is no way to date-sort Google hits,
>> which are instances of web pages using the search terms. And
>> 1,800 web hits is not be all that many.
>>
>> But "Google Groups" is a search of Usenet posts. These can be
>> classified by date and Usenet is a much smaller universe than
>> the web. 1,800 Usenet hits in six months is a significant
>> number. (For comparison, "truthiness" gets some 920,000 web
>> hits; "hamdan" gets some 3 million web hits, but only 1,500
>> Google Groups hits in the last six months; "anthrax" gets
>> over 25 million web hits and some 8,200 Google Groups hits in
>> the six-month period.)
>>
>> To search Google Groups, go to www.google.com and click the
>> "more>>" link, then click the "Groups" option. (It used to be
>> one of the main choices, but has been supplanted by "Images",
>> "Video", and "Maps.")
>>
>> --Dave Wilton
>>   dave at wilton.net
>>
>> -----Original Message-----
>> From: American Dialect Society
>> [mailto:ADS-L at LISTSERV.UGA.EDU] On Behalf Of RonButters at AOL.COM
>> Sent: Thursday, August 17, 2006 7:17 AM
>> To: ADS-L at LISTSERV.UGA.EDU
>> Subject: Re: The not-so-slow death of truthiness?
>>
>> I didn't use Google because I didn't know how to sort out the
>> most recent uses from the earlier ones (when the fad was in
>> full bloom). 1800 Google hits in 6 months does not seem like
>> a lot for a word that was declared "Word of the Year" only 8
>> months ago. And I wonder how many of those hits were in the
>> first four months of the 6.
>>
>> 106 ProQuest hits (those not mentioning Colbert) in 6 months
>> is likewise pretty puny. And same question: how many of those
>> hits were in the first four months of the 6?
>>
>> Given that LexisNexis offers the OPTION of searching the past
>> week and the past month, I assumed that they get the postings
>> online pretty quickly. I do
>>
>> recall that one of the responses was fairly late in August. I
>>   WAS using
>> "LexisNexis Academic," which is not as powerful as the
>> regular LexisNexis (one has to be a law professor at Duke to
>> have access to the regular LexisNexis). Still,
>>
>> the relative figures for the words I searched for should be
>> about the same.
>>
>> To my mind, the Google report simply confirms the view that
>> TRUTHINESS is a mere stunt word that got a lot of publicity,
>> not something that is lexicographically important in its own
>> right--except as an example of how a stunt word can make a
>> brief splash--and how a scholarly society can go giddy in the
>> glare of national publicity (all in good fun, of course).
>>
>> In a message dated 8/17/06 9:44:46 AM, dave at WILTON.NET writes:
>>
>>
>> > You may be searching in the wrong place. Google Groups
>> gives over 1800
>> hits
>> > for "truthiness" over the last six months, including nearly
>> 1500 where
>> > "Colbert" doesn't appear in the same post.
>> >
>> > Proquest Newspapers has some 200 hits for the same period,
>> including
>> > 94
>> that
>> > do not have "Colbert."
>> >
>> > How fast does LexisNexis include recent publications? ProQuest has
>> > hits as late as 14 August. If LexisNexis takes its time
>> updating its
>> > database,
>> that
>> > may help explain the paucity of hits in recent months.
>> >
>> > --Dave Wilton
>> >   dave at wilton.net
>> >
>> > -----Original Message-----
>> > From: American Dialect Society [mailto:ADS-L at LISTSERV.UGA.EDU] On
>> > Behalf
>> Of
>> > RonButters at AOL.COM
>> > Sent: Wednesday, August 16, 2006 6:26 PM
>> > To: ADS-L at LISTSERV.UGA.EDU
>> > Subject: The not-so-slow death of truthiness?
>> >
>> > This caused me to think, "Whatever happened to truthiness?" A quick
>> > check
>> of
>> >
>> > LexisNexis Academinc shows 69 hits in the past six months, 3 in the
>> > last month, and 0 in the past week. This makes it about as
>> well used
>> > as LIMPID and only slightly ahead of OTIOSE and RECONDITE. Franklin
>> > Pierce is more popular.
>> >
>> > At least ADS didn't vote it "most likely to succeed." Maybe "Most
>> > likely
>> to
>> > suck as a real word" would have been a better category?
>> >
>> > In a message dated 8/16/06 9:31:21 AM,
>> wuxxmupp2000 at YAHOO.COM writes:
>> >
>> >
>> > > No puns on "fictional" allowed !
>> > >
>> > >   JL
>> > >
>> > > Charles Doyle <cdoyle at UGA.EDU> wrote:
>> > >   ---------------------- Information from the mail header
>> > > -----------------------
>> > > Sender: American Dialect Society
>> > > Poster: Charles Doyle
>> > > Subject: Re: 1851 jest about trad repertoire
>> > >
>> >
>> --------------------------------------------------------------
>> --------------
>> > -
>> > > --
>> > >
>> > > Hmmm. Fictional evidence. Is that a little like truthiness?
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> > ------------------------------------------------------------
>> > The American Dialect Society - http://www.americandialect.org
>> >
>> > ------------------------------------------------------------
>> > The American Dialect Society - http://www.americandialect.org
>> >
>> >
>>
>> ------------------------------------------------------------
>> The American Dialect Society - http://www.americandialect.org
>>
>> ------------------------------------------------------------
>> The American Dialect Society - http://www.americandialect.org
>>
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org



Dr M Lynne Murphy
Senior Lecturer in Linguistics and English Language
Arts B133
University of Sussex
Brighton BN1 9QN

phone: +44-(0)1273-678844
http://separatedbyacommonlanguage.blogspot.com

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list