Sampo,
The command line perl script I sent you earlier (which I failed to copy
to the list), could actually be expressed more briefly. Again, granting
that the data is already tokenized to one word token per line:
cat token.stream | \
perl -pe 's/(\S+)/exists($t{$1}) ? $t{$1}:($t{$1}=++$tc)/e'
Best regards,
Dave Graff