[Corpora-List] Query about nomenclature
Andrew Kehoe
Andrew.Kehoe at uce.ac.uk
Fri Mar 11 20:07:43 UTC 2005
John Sowa's original queries were
1) ngram
2) ngram not perl
3) n-gram
To get more accuate results, these should be run as
1) ngram
2) ngram -perl
3) "n-gram" (to force Google to match only 'n-gram' with a hyphen)
It is not necessary to run
"n-gram" -perl
because (as Damon Allen Davison said) the Perl module we want to filter out of the results is called Text::Ngram not Text::N-gram.
Andrew Kehoe
Research and Development Unit for English Studies
School of English
University of Central England, Birmingham
http://rdues.uce.ac.uk/ <http://rdues.uce.ac.uk/>
http://www.webcorp.org.uk/ <http://www.webcorp.org.uk/>
-----Original Message-----
From: owner-corpora at lists.uib.no on behalf of Normunds Gruzitis
Sent: Fri 11/03/2005 17:53
To: CORPORA at HD.UIB.NO
Cc:
Subject: RE: [Corpora-List] Query about nomenclature
Did you put "n-gram" in quotes in your search query?
Google's response to me: "Results 1 - 10 of about 63,600 for
"n-gram" -perl."
Regards,
Normunds
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]On
Behalf Of Andrew Kehoe
Sent: Friday, March 11, 2005 5:33 PM
To: John F. Sowa
Cc: CORPORA at HD.UIB.NO
Subject: RE: [Corpora-List] Query about nomenclature
John
You need to use the search term "ngram -perl" rather than "ngram not
perl" because, as Stefan Evert pointed out, "ngram not perl" just
returns pages containing all 3 of those words.
Another problem with your method is that Google ignores hyphens in
search terms. One of the pages returned for the term "n-gram" is
http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092 but this
page does not contain the word "n-gram" at all, only "ngram" without the
hyphen.
Andrew Kehoe
Research and Development Unit for English Studies
School of English
University of Central England, Birmingham
http://rdues.uce.ac.uk/
http://www.webcorp.org.uk/
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of John F. Sowa
Sent: 10 March 2005 01:43
To: Damon Allen Davison
Cc: John Mckenny; CORPORA at HD.UIB.NO
Subject: Re: [Corpora-List] Query about nomenclature
Damon Davison's use of Google inspired me to try
a variation. I just typed three queries and
got the following number of hits:
Search string Hits
------------- ------
ngram 21,100
ngram not perl 540
n-gram 85,700
This seems to provide overwhelming evidence for
a hyphen between "n" and "gram". Since Google
doesn't distinguish capitals, that leaves the
capitalization question unresolved.
John Sowa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20050311/8c10d069/attachment.htm>
More information about the Corpora
mailing list