Corpora: Re: co-occurrences
Bengt Dahlqvist
bengt.dahlqvist at ling.uu.se
Fri Apr 28 08:01:11 UTC 2000
At 23:34 2000-04-17 -0700, Victoria Powers wrote:
>I sent out an earlier Email about this issue and I haven't found what I
>needed so I thought if I explained what I was looking for better someone
>might have seen a program that would work. I am looking for something that
>will compute co-occurrences. I will be integrating this
>with Perl code on a unix box so I need something that will just output a
>text file of co-occurrences when I run the program on some corpus.
Briefly, try something like this:
A Korn shell script:
#!/bin/ksh
# find.sh
tr '\n' ' ' < text_in | tr '.,:;?!' '\n' | ./co.pl $1 | sort | uniq -c >
list_out
return
A Perl script:
#!/usr/bin/perl
# co.pl
$keyword = @ARGV[0];
while (<STDIN>) {
chop;
while (m/\s$keyword\s+([^ ]+)/g) {
print "$keyword $1\n"; } }
Then just invoke the script stating the desired keyword:
./find.sh for
Beware that one might want other clause/sentence delimiters and
maybe also a way to handle words within quotes and parentheses.
Bengt Dahlqvist, Ph.D.
Uppsala University
More information about the Corpora
mailing list