[Corpora-List] Query about nomenclature

Andrew Kehoe Andrew.Kehoe at uce.ac.uk
Fri Mar 11 20:07:43 UTC 2005


John Sowa's original queries were
 
1) ngram                   
2) ngram not perl            
3) n-gram       
 
To get more accuate results, these should be run as
 
1) ngram
2) ngram -perl
3) "n-gram"    (to force Google to match only 'n-gram' with a hyphen)      
 
It is not necessary to run
 
"n-gram" -perl
 
because (as Damon Allen Davison said) the Perl module we want to filter out of the results is called  Text::Ngram  not Text::N-gram.
 
Andrew Kehoe
Research and Development Unit for English Studies
School of English
University of Central England, Birmingham
http://rdues.uce.ac.uk/ <http://rdues.uce.ac.uk/> 

http://www.webcorp.org.uk/ <http://www.webcorp.org.uk/> 
 
 
-----Original Message----- 
From: owner-corpora at lists.uib.no on behalf of Normunds Gruzitis 
Sent: Fri 11/03/2005 17:53 
To: CORPORA at HD.UIB.NO 
Cc: 
Subject: RE: [Corpora-List] Query about nomenclature



	Did you put "n-gram" in quotes in your search query?
	Google's response to me: "Results 1 - 10 of about 63,600 for
	"n-gram" -perl."
	
	Regards,
	Normunds
	
	
	-----Original Message-----
	From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]On
	Behalf Of Andrew Kehoe
	Sent: Friday, March 11, 2005 5:33 PM
	To: John F. Sowa
	Cc: CORPORA at HD.UIB.NO
	Subject: RE: [Corpora-List] Query about nomenclature
	
	
	John
	
	You need to use the search term "ngram -perl" rather than "ngram not
	perl" because, as Stefan Evert pointed out, "ngram not perl" just
	returns pages containing all 3 of those words.
	
	Another problem with your method is that Google ignores hyphens in
	search terms. One of the pages returned for the term "n-gram" is
	http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092 but this
	page does not contain the word "n-gram" at all, only "ngram" without the
	hyphen.
	
	Andrew Kehoe
	Research and Development Unit for English Studies
	School of English
	University of Central England, Birmingham
	http://rdues.uce.ac.uk/
	
	http://www.webcorp.org.uk/
	
	-----Original Message-----
	From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
	Behalf Of John F. Sowa
	Sent: 10 March 2005 01:43
	To: Damon Allen Davison
	Cc: John Mckenny; CORPORA at HD.UIB.NO
	Subject: Re: [Corpora-List] Query about nomenclature
	
	Damon Davison's use of Google inspired me to try
	a variation.  I just typed three queries and
	got the following number of hits:
	
	Search string            Hits
	-------------           ------
	ngram                   21,100
	
	ngram not perl             540
	
	n-gram                  85,700
	
	This seems to provide overwhelming evidence for
	a hyphen between "n" and "gram".  Since Google
	doesn't distinguish capitals, that leaves the
	capitalization question unresolved.
	
	John Sowa
	
	
	
	
	
	
	
	
	
	
	
	
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20050311/8c10d069/attachment.htm>


More information about the Corpora mailing list