[Corpora-List] Google "region"-based searches

Tristan Miller miller at ukp.informatik.tu-darmstadt.de
Wed Nov 28 09:49:12 UTC 2012


Greetings.

On 28/11/12 12:00 AM, John F Sowa wrote:
> In ancient times (pre 21st century), Google supported Boolean
> expressions for searching.  But now it's impossible to control
> their search in any predictable fashion.
> 
> For example, I wanted to count the number of web pages that used
> the phrase "enterprise integration pattern" and the word 'sql'.
> 
> But when I type just "enterprise integration pattern" by itself,
> I get 114,000 hits.  When I add another word, the number should
> decrease.  But the following combination gets 137,000 hits:
> 
>    "enterprise integration pattern" sql
> 
> The following combination gets 274,000 hits:
> 
>    "enterprise integration pattern" java
> 
> And the following gets 25,900,000 hits:
> 
>    "enterprise integration pattern" java sql
> 
> I get the same numbers with a one-line search or with
> their so-called advanced search.
> 
> Does anybody know how to bypass the Google heuristics and
> force it to use a simple regular expression for searching?

Google used to support a "+" modifier for search terms; this instructed
the search to return only those pages which include the search terms.
(Without the modifier, Google was free to disregard the search terms at
its discretion.)  The "+" modifier was dropped, probably for marketing
reasons, once Google+ was introduced.  Supposedly you can now achieve
the same effect by putting the "required" terms in quotation marks, and
in my experience this works most of the time.  For your examples, it
appears that sometimes it does and sometimes it doesn't:

   "enterprise integration pattern"

gets 117,000 hits, but oddly both

   "enterprise integration pattern" sql

and

   "enterprise integration pattern" "sql"

get 137,000 results.  On the other hand,

   "enterprise integration pattern" java sql

gets 25,800,000 results, but

   "enterprise integration pattern" "java" "sql"

returns a more sensible 8520 results.

Regards,
Tristan

-- 
Tristan Miller, Doctoral Researcher
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121128/3a50aa7e/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list