Corpora: code for random selection of concordance lines
Alexander Clark
asc at aclark.demon.co.uk
Fri Mar 22 08:43:53 UTC 2002
Rosie Jones wrote:
>
> On Thu, 21 Mar 2002, Tony Berber Sardinha wrote:
> > I wonder if anyone has a bit of perl or java code (or unix utilities)
> > for drawing an x number of lines at random from a concordance?
> [...]
>
>
An alternative is to use the Fisher-Yates algorithm to shuffle the whole
file (linear in the number of lines)
and then take the head. This is more efficient in time if it fits in
memory.
shuffle.pl < file | head -n
#!/usr/bin/perl -w
# shuffle the lines at random
# Using Fisher-Yates algorithm
use strict;
@lines = (<>);
for ($i = @lines; --$i;){
$j = int rand($i+1);
($lines[$i], $lines[$j]) = ($lines[$j], $lines[$i]);
}
print @lines;
--
Alexander Clark
asc at aclark.demon.co.uk
http://www.issco.unige.ch/staff/clark/index.html
ISSCO/ETI, University of Geneva
More information about the Corpora
mailing list