<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<TITLE>[Corpora-List] Re: problems with Google</TITLE>
<META content="MSHTML 6.00.2800.1492" name=GENERATOR></HEAD>
<BODY dir=ltr>
<DIV>Paul</DIV>
<DIV> </DIV>
<DIV>I would imagine that Google will soon be removing support from the wildcard
in their API (as well as Google.de and Google.co.uk).</DIV>
<DIV> </DIV>
<DIV>This page from a few weeks ago says that the wildcard is still working in
FindForward.com (which uses the Google API): <A
href="http://blog.outer-court.com/archive/2005-03-06-n50.html">http://blog.outer-court.com/archive/2005-03-06-n50.html</A>.
However, if you enter “god * america” in FindForward.com today you'll find
that the wildcard works sometimes but doesn't work other times,
probably depending upon which Google server the query is passed to.</DIV>
<DIV> </DIV>
<DIV>
<DIV>Andrew Kehoe</DIV>
<DIV>Research and Development Unit for English Studies</DIV>
<DIV>Univerity of Central England in Birmingham</DIV>
<DIV> </DIV>
<DIV><A href="http://www.webcorp.org.uk/"
target=_BLANK>http://www.webcorp.org.uk/</A></DIV></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV><FONT size=2>-----Original Message----- <BR><B>From:</B>
owner-corpora@lists.uib.no on behalf of Deane, Paul <BR><B>Sent:</B>
Thu 17/03/2005 15:39 <BR><B>To:</B> CORPORA@uib.no <BR><B>Cc:</B>
<BR><B>Subject:</B> RE: [Corpora-List] Re: problems with
Google<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=127223715-17032005>Has
anybody checked whether the behavior with Google's Web API and its standard
search is different?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=127223715-17032005></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=127223715-17032005>I
have code using the Java Web API which makes use of the asterisk to blank out
a single word (not an unrestricted wildcard.) As of yesterday, when I tested
the code, it still appeared to be working as designed.</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Andrew Kehoe
[mailto:Andrew.Kehoe@uce.ac.uk]<BR><B>Sent:</B> Thursday, March 17, 2005
9:27 AM<BR><B>To:</B> CORPORA@uib.no<BR><B>Subject:</B> RE: [Corpora-List]
Re: problems with Google<BR><BR></FONT></DIV>
<DIV><FONT size=2>
<DIV>John</DIV>
<DIV> </DIV>
<DIV>Even if you put double quotes around the wildcard character Google will
ignore it. When you search for:</DIV>
<DIV> </DIV>
<DIV>"what does "*" mean"</DIV>
<DIV> </DIV>
<DIV>Google is actually searching for 2 'phrases': "what does " and " mean".
You cannot nest double quotes in Google so the double quotes around the *
are actually closing your initial quote and beginning a new quote, with the
wildcard ignored completely.</DIV>
<DIV> </DIV>
<DIV>It may be the case that SOME of the pages Google returns will contain
"what does", followed by one other word, followed by "mean" but your query
does not ask for this specifically. Google could (and does) also return
pages containing "mean" and "what does" in the opposite order, or with
multiple words in between.</DIV>
<DIV> </DIV>
<DIV>Similarly, "what does "*" "*" mean" is actually searching for 3
'phrases': 1) "what does ", 2) " " (a space), and 3)" mean".</DIV>
<DIV> </DIV>
<DIV>So, Google hasn't retained support for wildcards at all I'm afraid, and
this is why we are developing our own search engine in WebCorp, as
Antoinette Renouf mentioned yesterday.</DIV>
<DIV> </DIV>
<DIV>Andrew Kehoe</DIV>
<DIV>Research and Development Unit for English Studies</DIV>
<DIV>Univerity of Central England in Birmingham</DIV>
<DIV> </DIV>
<DIV><A href="http://www.webcorp.org.uk/"
target=_BLANK>http://www.webcorp.org.uk/</A></DIV></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV><FONT size=2>-----Original Message----- <BR><B>From:</B>
owner-corpora@lists.uib.no on behalf of John Milton
<BR><B>Sent:</B> Thu 17/03/2005 13:39 <BR><B>To:</B> CORPORA@uib.no
<BR><B>Cc:</B> <BR><B>Subject:</B> [Corpora-List] Re: problems with
Google<BR><BR></FONT></DIV>
<P><FONT size=2>I just discovered that Google seems to have retained some
use of the<BR>wildcard for words if you use double quotes with the
asterisk. A search<BR>for "what does "*" mean" and "what does "*" "*"
mean" results MAINLY in<BR>any one and two words respectively. If anyone
else is using web searches<BR>as language learning/teaching resources,
this also looks promising:<BR><A
href="http://www.findforward.com/">http://www.findforward.com/</A><BR><BR>John
Milton<BR>Hong Kong University of Science &
Technology<BR><BR><BR><BR></FONT></P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>
<BR>
<BR>
<BR>
<P><FONT SIZE=2>************************************************************************** </FONT></P>
<P><FONT SIZE=2>This e-mail and any files transmitted with it may contain privileged or </FONT></P>
<P><FONT SIZE=2>confidential information. It is solely for use by the individual for whom </FONT></P>
<P><FONT SIZE=2>it is intended, even if addressed incorrectly. If you received this e-mail </FONT></P>
<P><FONT SIZE=2>in error, please notify the sender; do not disclose, copy, distribute, or </FONT></P>
<P><FONT SIZE=2>take any action in reliance on the contents of this information; and delete </FONT></P>
<P><FONT SIZE=2>it from your system. Any other use of this e-mail is prohibited. Thank you </FONT></P>
<P><FONT SIZE=2>for your compliance.</FONT></P>
<BR>
<BR>
<BR>