Corpora: Re: co-occurrences

Bengt Dahlqvist bengt.dahlqvist at ling.uu.se
Fri Apr 28 08:01:11 UTC 2000


At 23:34 2000-04-17 -0700, Victoria Powers wrote:
>I sent out an earlier Email about this issue and I haven't found what I
>needed so I thought if I explained what I was looking for better someone
>might have seen a program that would work. I am looking for something that
>will compute co-occurrences. I will be integrating this
>with Perl code on a unix box so I need something that will just output a
>text file of co-occurrences when I run the program on some corpus.

Briefly, try something like this:
A Korn shell script:
   #!/bin/ksh
   # find.sh
   tr '\n' ' ' < text_in | tr '.,:;?!' '\n' | ./co.pl $1 | sort | uniq -c >
list_out
   return
A Perl script:
   #!/usr/bin/perl
   # co.pl
   $keyword = @ARGV[0];
   while (<STDIN>) {
     chop;
     while (m/\s$keyword\s+([^ ]+)/g) {
        print "$keyword $1\n"; } }
Then just invoke the script stating the desired keyword:
   ./find.sh for
Beware that one might want other clause/sentence delimiters and
maybe also a way to handle words within quotes and parentheses.

Bengt Dahlqvist, Ph.D.
Uppsala University



More information about the Corpora mailing list