Corpora: a program needed

David Graff graff at unagi.cis.upenn.edu
Thu May 30 14:35:22 UTC 2002


Sampo,

The command line perl script I sent you earlier (which I failed to copy
to the list), could actually be expressed more briefly.  Again, granting
that the data is already tokenized to one word token per line:

cat token.stream | \
 perl -pe 's/(\S+)/exists($t{$1}) ? $t{$1}:($t{$1}=++$tc)/e'


    Best regards,

	Dave Graff



More information about the Corpora mailing list