Corpora: code for random selection of concordance lines
Bruce L. Lambert, Ph.D.
lambertb at uic.edu
Thu Mar 21 20:23:35 UTC 2002
At 04:05 PM 3/21/2002 -0300, Tony Berber Sardinha wrote:
>Dear list members
>
>I wonder if anyone has a bit of perl or java code (or unix utilities) for
>drawing an x number of lines at random from a concordance?
#!/bin/sh
IFILE="$1"
N="$2"
gawk 'BEGIN {srand()} {print rand(),$0}' $IFILE | sort | gawk
'{$1="";print}' | head -$N
On a Unix system that has gawk: Copy this into a file called 'randomize'.
At the prompt (~>) type:
~> chmod +x randomize
then
~> randomize some_input_file N > some_output_file
N is the number or lines desired in the output. If your system does not
have gawk, you can download and install it or try awk (you'll need to
change gawk to awk in the script).
-bruce
More information about the Corpora
mailing list