Sampo, The command line perl script I sent you earlier (which I failed to copy to the list), could actually be expressed more briefly. Again, granting that the data is already tokenized to one word token per line: cat token.stream | \ perl -pe 's/(\S+)/exists($t{$1}) ? $t{$1}:($t{$1}=++$tc)/e' Best regards, Dave Graff