Google's (and Yahoo's) hit counts are estimates and you shouldn't rely on them too much. I once tried to incorporate them to recreate Magnini et al's work "Is It the Right Answer? Exploiting Web Redundancy for Answer Validation" but I gave up due to the inconsistencies in hit counts returned by Google and Yahoo. This was back in 2009 but I would be surprised if things were different now.<br>
<br>Diego<br><br>
<div class="gmail_extra"><br><br><div class="gmail_quote">On 28 November 2012 10:18, Trevor Jenkins <span dir="ltr"><<a href="mailto:trevor.jenkins@suneidesis.com" target="_blank">trevor.jenkins@suneidesis.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 27 Nov 2012, at 23:00, John F Sowa <<a href="mailto:sowa@bestweb.net">sowa@bestweb.net</a>> wrote:<br>
<br>
> In ancient times (pre 21st century), Google supported Boolean<br>
> expressions for searching. But now it's impossible to control<br>
> their search in any predictable fashion.<br>
<br>
Google's implementation of Boolean expressions was never that good anyway. Their NOT (the - sign) never really worked as a Boolean NOT more of a "we'll disregard your request if we feel like it". Couple that with the lack of any (working) collocation features and it's a poor excuse for a text/document retrieval system.<br>
<br>
> But when I type just "enterprise integration pattern" by itself,<br>
> I get 114,000 hits. When I add another word, the number should<br>
> decrease. But the following combination gets 137,000 hits:<br>
<br>
There also used to be probably still is a hidden "feature" in that Google would terminate searches after some time slice. Even if there were more hits available you didn't see them. Used to be simple to demonstrate by submitting the same search request several times in quick succession never the same answer twice. The only numbers of results that can believe are zero and one anything is practically non-deterministic.<br>
<br>
> Does anybody know how to bypass the Google heuristics and<br>
> force it to use a simple regular expression for searching?<br>
<br>
Sadly no. Other than using a search engine with a better search system behind it. But unfortunately Google has, for the moment, the largest cache of web pages and documents.<br>
<br>
Personally I question whether Google is still a search engine, more a targeted adverts engine these days. (Thank god for browser add-ons like AdBlockPlus, Ghostery, GreaseMonkey and their like for squelching those nasty adverts.)<br>
<br>
[I should declare a commercial interest here I worked for paralog who produced one of the best … no *the* best* text retrieval system, trip. Product still exists although I've not been associated with it for over a decade. But it still remains the best there is; if you can afford to purchase it.]<br>
<br>
Regards, Trevor.<br>
<br>
<>< Re: deemed!<br>
<br>
<br>
_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</blockquote></div><br><br clear="all"><br>-- <br><div> This message is intended for the addressee named and may<br> contain confidential information. If you are not the intended<br> recipient, please delete it and notify the sender. Views expressed<br>
in this message are those of the individual sender, and are not<br> necessarily the views of Macquarie University.<br>---------------------------------------------------------------------<br>Dr. Diego MOLLA ALIOD <a href="mailto:diego.molla-aliod@mq.edu.au" target="_blank">diego.molla-aliod@mq.edu.au</a><br>
Department of Computing <a href="http://web.science.mq.edu.au/%7Ediego" target="_blank">http://web.science.mq.edu.au/~diego</a><br>Macquarie University </div><br>
</div>