Corpora: code for random selection of concordance lines

Bruce L. Lambert, Ph.D. lambertb at uic.edu
Thu Mar 21 20:23:35 UTC 2002


At 04:05 PM 3/21/2002 -0300, Tony Berber Sardinha wrote:
>Dear list members
>
>I wonder if anyone has a bit of perl or java code (or unix utilities) for
>drawing an x number of lines at random from a concordance?

#!/bin/sh

IFILE="$1"
N="$2"

gawk 'BEGIN {srand()} {print rand(),$0}' $IFILE | sort | gawk
'{$1="";print}'   | head -$N


On a Unix system that has gawk: Copy this into a file called 'randomize'.
At the prompt (~>) type:

~> chmod +x randomize

then

~> randomize some_input_file N > some_output_file


N is the number or lines desired in the output. If your system does not
have gawk, you can download and install it or try awk (you'll need to
change gawk to awk in the script).

-bruce



More information about the Corpora mailing list