[Corpora-List] Re: problems with Google

Andrew Kehoe Andrew.Kehoe at uce.ac.uk
Thu Mar 17 17:03:22 UTC 2005


Paul
 
I would imagine that Google will soon be removing support from the wildcard in their API (as well as Google.de and Google.co.uk).
 
This page from a few weeks ago says that the wildcard is still working in FindForward.com (which uses the Google API): http://blog.outer-court.com/archive/2005-03-06-n50.html. However, if you enter “god * america” in FindForward.com today you'll find that the wildcard works sometimes but doesn't work other times, probably depending upon which Google server the query is passed to.
 
Andrew Kehoe
Research and Development Unit for English Studies
Univerity of Central England in Birmingham
 
http://www.webcorp.org.uk/

	-----Original Message----- 
	From: owner-corpora at lists.uib.no on behalf of Deane, Paul 
	Sent: Thu 17/03/2005 15:39 
	To: CORPORA at uib.no 
	Cc: 
	Subject: RE: [Corpora-List] Re: problems with Google
	
	
	Has anybody checked whether the behavior with Google's Web API and its standard search is different?
	 
	I have code using the Java Web API which makes use of the asterisk to blank out a single word (not an unrestricted wildcard.) As of yesterday, when I tested the code, it still appeared to be working as designed.

		-----Original Message-----
		From: Andrew Kehoe [mailto:Andrew.Kehoe at uce.ac.uk]
		Sent: Thursday, March 17, 2005 9:27 AM
		To: CORPORA at uib.no
		Subject: RE: [Corpora-List] Re: problems with Google
		
		
		
		John
		 
		Even if you put double quotes around the wildcard character Google will ignore it. When you search for:
		 
		"what does "*" mean"
		 
		Google is actually searching for 2 'phrases': "what does " and " mean". You cannot nest double quotes in Google so the double quotes around the * are actually closing your initial quote and beginning a new quote, with the wildcard ignored completely.
		 
		It may be the case that SOME of the pages Google returns will contain "what does", followed by one other word, followed by "mean" but your query does not ask for this specifically. Google could (and does) also return pages containing "mean" and "what does" in the opposite order, or with multiple words in between.
		 
		Similarly, "what does "*" "*" mean" is actually searching for 3 'phrases': 1) "what does ", 2) " " (a space), and 3)" mean".
		 
		So, Google hasn't retained support for wildcards at all I'm afraid, and this is why we are developing our own search engine in WebCorp, as Antoinette Renouf mentioned yesterday.
		 
		Andrew Kehoe
		Research and Development Unit for English Studies
		Univerity of Central England in Birmingham
		 
		http://www.webcorp.org.uk/

			-----Original Message----- 
			From: owner-corpora at lists.uib.no on behalf of John Milton 
			Sent: Thu 17/03/2005 13:39 
			To: CORPORA at uib.no 
			Cc: 
			Subject: [Corpora-List] Re: problems with Google
			
			

			I just discovered that Google seems to have retained some use of the
			wildcard for words if you use double quotes with the asterisk. A search
			for "what does "*" mean" and "what does "*" "*" mean" results MAINLY in
			any one and two words respectively. If anyone else is using web searches
			as language learning/teaching resources, this also looks promising:
			http://www.findforward.com/
			
			John Milton
			Hong Kong University of Science & Technology
			
			
			
			




************************************************************************** 

This e-mail and any files transmitted with it may contain privileged or 

confidential information. It is solely for use by the individual for whom 

it is intended, even if addressed incorrectly. If you received this e-mail 

in error, please notify the sender; do not disclose, copy, distribute, or 

take any action in reliance on the contents of this information; and delete 

it from your system. Any other use of this e-mail is prohibited. Thank you 

for your compliance.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20050317/6b3bb4ff/attachment.htm>


More information about the Corpora mailing list