<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 6.0.6556.0">
<TITLE>RE: [Corpora-List] Query about nomenclature</TITLE>
</HEAD>
<BODY dir=ltr>
<DIV><FONT size=2>John Sowa's original queries were</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>1)
ngram
<BR>2) ngram not
perl <BR>3)
n-gram </FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>To get more accuate results, these should be run
as</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>1) ngram</FONT></DIV>
<DIV><FONT size=2>2) ngram -perl</FONT></DIV>
<DIV><FONT size=2>3) "n-gram" (to force Google to match
only 'n-gram' with a hyphen) </FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>It is not necessary to run</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>"n-gram" -perl</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>because (as Damon Allen Davison said) the Perl module we want
to filter out of the results is called Text::Ngram not
Text::N-gram.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT size=2>Andrew Kehoe<BR>Research and Development Unit for English
Studies<BR>School of English<BR>University of Central England,
Birmingham<BR></FONT><A href="http://rdues.uce.ac.uk/"><FONT
size=2>http://rdues.uce.ac.uk/</FONT></A><BR><BR><A
href="http://www.webcorp.org.uk/"><FONT
size=2>http://www.webcorp.org.uk/</FONT></A></DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV><FONT size=2>-----Original Message----- <BR><B>From:</B>
owner-corpora@lists.uib.no on behalf of Normunds Gruzitis
<BR><B>Sent:</B> Fri 11/03/2005 17:53 <BR><B>To:</B> CORPORA@HD.UIB.NO
<BR><B>Cc:</B> <BR><B>Subject:</B> RE: [Corpora-List] Query about
nomenclature<BR><BR></DIV></FONT>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT size=2>Did you put "n-gram" in quotes in your search
query?<BR>Google's response to me: "Results 1 - 10 of about 63,600
for<BR>"n-gram" -perl."<BR><BR>Regards,<BR>Normunds<BR><BR><BR>-----Original
Message-----<BR>From: owner-corpora@lists.uib.no [<A
href="mailto:owner-corpora@lists.uib.no">mailto:owner-corpora@lists.uib.no</A>]On<BR>Behalf
Of Andrew Kehoe<BR>Sent: Friday, March 11, 2005 5:33 PM<BR>To: John F.
Sowa<BR>Cc: CORPORA@HD.UIB.NO<BR>Subject: RE: [Corpora-List] Query about
nomenclature<BR><BR><BR>John<BR><BR>You need to use the search term "ngram
-perl" rather than "ngram not<BR>perl" because, as Stefan Evert pointed out,
"ngram not perl" just<BR>returns pages containing all 3 of those
words.<BR><BR>Another problem with your method is that Google ignores hyphens
in<BR>search terms. One of the pages returned for the term "n-gram" is<BR><A
href="http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092">http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092</A>
but this<BR>page does not contain the word "n-gram" at all, only "ngram"
without the<BR>hyphen.<BR><BR>Andrew Kehoe<BR>Research and Development Unit
for English Studies<BR>School of English<BR>University of Central England,
Birmingham<BR><A
href="http://rdues.uce.ac.uk/">http://rdues.uce.ac.uk/</A><BR><BR><A
href="http://www.webcorp.org.uk/">http://www.webcorp.org.uk/</A><BR><BR>-----Original
Message-----<BR>From: owner-corpora@lists.uib.no [<A
href="mailto:owner-corpora@lists.uib.no">mailto:owner-corpora@lists.uib.no</A>]
On<BR>Behalf Of John F. Sowa<BR>Sent: 10 March 2005 01:43<BR>To: Damon Allen
Davison<BR>Cc: John Mckenny; CORPORA@HD.UIB.NO<BR>Subject: Re: [Corpora-List]
Query about nomenclature<BR><BR>Damon Davison's use of Google inspired me to
try<BR>a variation. I just typed three queries and<BR>got the following
number of hits:<BR><BR>Search
string
Hits<BR>-------------
------<BR>ngram
21,100<BR><BR>ngram not
perl
540<BR><BR>n-gram
85,700<BR><BR>This seems to provide overwhelming evidence for<BR>a hyphen
between "n" and "gram". Since Google<BR>doesn't distinguish capitals,
that leaves the<BR>capitalization question unresolved.<BR><BR>John
Sowa<BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR><BR></FONT></P></BLOCKQUOTE>
</BODY>
</HTML>