[Corpora-List] Query about nomenclature

Damon Allen Davison allolex at gmail.com
Wed Mar 9 21:46:30 UTC 2005


Dear John,

Here are some rather unscientific results. My corpus was a page of
Google results limited to 100 for the search term "n gram". Doing both
"ngram" and "n gram" was slightly problematic because their is a Perl
CPAN module called Text::Ngram, so that weights the results for
"ngram" quite a bit.

n-gram : 128 times
N-gram : 126 times
ngram : 57 times
N-Gram : 34 times
Ngram : 10 times
N-GRAM : 9 times
NGRAM : 8 times
n-Gram : 7 times
NGram : 5 times


I did this using this Perl script after doing "links --dump
results.html > results.txt" to the results file I had saved.

#!/usr/bin/perl
# syntax: findword <filename>
use warnings;
use strict;
my %total;
my @matches;
while ( <> ) {
        @matches = /(n-?gram)/i; # case-insensitive, case-preserving
matching, dash optional
        $total{$_}++ foreach @matches;
}
print map { "$_ : $total{$_} times\n" } reverse sort { $total{$a} <=>
$total{$b} } keys %total;

Anyway, I hope that helps a little. You can use the same script to do
searches on other files. :)

I like to use "n-gram".

Warm regards,

Damon
--

Damon Allen Davison
http://allolex.net



More information about the Corpora mailing list