Corpora: a program needed

David Graff graff at
Thu May 30 14:35:22 UTC 2002


The command line perl script I sent you earlier (which I failed to copy
to the list), could actually be expressed more briefly.  Again, granting
that the data is already tokenized to one word token per line:

cat | \
 perl -pe 's/(\S+)/exists($t{$1}) ? $t{$1}:($t{$1}=++$tc)/e'

    Best regards,

	Dave Graff

