[Corpora-List] Google "region"-based searches

Diego Molla-Aliod diego.molla-aliod at mq.edu.au
Wed Nov 28 03:31:57 UTC 2012


Google's (and Yahoo's) hit counts are estimates and you shouldn't rely on
them too much. I once tried to incorporate them to recreate Magnini et al's
work "Is It the Right Answer? Exploiting Web Redundancy for Answer
Validation" but I gave up due to the inconsistencies in hit counts returned
by Google and Yahoo. This was back in 2009 but I would be surprised if
things were different now.

Diego



On 28 November 2012 10:18, Trevor Jenkins <trevor.jenkins at suneidesis.com>wrote:

> On 27 Nov 2012, at 23:00, John F Sowa <sowa at bestweb.net> wrote:
>
> > In ancient times (pre 21st century), Google supported Boolean
> > expressions for searching.  But now it's impossible to control
> > their search in any predictable fashion.
>
> Google's implementation of Boolean expressions was never that good anyway.
> Their NOT (the - sign) never really worked as a Boolean NOT more of a
> "we'll disregard your request if we feel like it". Couple that with the
> lack of any (working) collocation features and it's a poor excuse for a
> text/document retrieval system.
>
> > But when I type just "enterprise integration pattern" by itself,
> > I get 114,000 hits.  When I add another word, the number should
> > decrease.  But the following combination gets 137,000 hits:
>
> There also used to be probably still is a hidden "feature" in that Google
> would terminate searches after some time slice. Even if there were more
> hits available you didn't see them. Used to be simple to demonstrate by
> submitting the same search request several times in quick succession never
> the same answer twice. The only numbers  of results that can believe are
> zero and one anything is practically non-deterministic.
>
> > Does anybody know how to bypass the Google heuristics and
> > force it to use a simple regular expression for searching?
>
> Sadly no. Other than using a search engine with a better search system
> behind it. But unfortunately Google has, for the moment, the largest cache
> of web pages and documents.
>
> Personally I question whether Google is still a search engine, more a
> targeted adverts engine these days. (Thank god for browser add-ons like
> AdBlockPlus, Ghostery, GreaseMonkey and their like for squelching those
> nasty adverts.)
>
> [I should declare a commercial interest here I worked for paralog who
> produced one of the best … no *the* best* text retrieval system, trip.
> Product still exists although I've not been associated with it for over a
> decade. But it still remains the best there is; if you can afford to
> purchase it.]
>
> Regards, Trevor.
>
> <>< Re: deemed!
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
  This message is intended for the addressee named and may
  contain confidential information.  If you are not the intended
  recipient, please delete it and notify the sender.  Views expressed
  in this message are those of the individual sender, and are not
  necessarily the views of Macquarie University.
---------------------------------------------------------------------
Dr. Diego MOLLA ALIOD                     diego.molla-aliod at mq.edu.au
Department of Computing          http://web.science.mq.edu.au/~diego
Macquarie University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121128/58ab7698/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list