Corpora: code for random selection of concordance lines

Alexander Clark asc at aclark.demon.co.uk
Fri Mar 22 08:43:53 UTC 2002


Rosie Jones wrote:
>
> On Thu, 21 Mar 2002, Tony Berber Sardinha wrote:
> > I wonder if anyone has a bit of perl or java code (or unix utilities)
> > for drawing an x number of lines at random from a concordance?
> [...]
>
>

An alternative is to use the Fisher-Yates algorithm to shuffle the whole
file (linear in the number of lines)
and then take the head.  This is more efficient in time if it fits in
memory.


shuffle.pl < file | head -n


#!/usr/bin/perl -w
# shuffle the lines at random
# Using Fisher-Yates algorithm

use strict;

@lines = (<>);
for ($i = @lines; --$i;){
    $j = int rand($i+1);
    ($lines[$i],  $lines[$j]) = ($lines[$j],  $lines[$i]);
}
print @lines;


--
Alexander Clark
asc at aclark.demon.co.uk
http://www.issco.unige.ch/staff/clark/index.html
ISSCO/ETI, University of Geneva



More information about the Corpora mailing list