[Corpora-List] Application for lemmatising corpora

Adam Funk a.funk at dcs.shef.ac.uk
Fri Mar 23 16:51:47 UTC 2007


Hunter, Duncan wrote:
> Hi all,
>  
> Thanks, I  have been looking at the applications suggested. Unfortunately, what I'm looking for is so simple that it might not be something that many people actually use. My texts are untagged, and I'd like to keep them that way for the moment. I actually want the lemmas to be inserted right there in the text, so you get for example; 'Yesterday I GO to the market.'  
>  
> I guess what I'm looking for is a kind of find/replace application that can read off a file of (lemmatising) replacements like GO>go, went, gone, going...!

Is the Porter stemmer close enough for your purpose?

There's a Perl implementation of it, which you could probably combine
with `perl -i.bak -p -e` to modify the text files in place.

http://search.cpan.org/~ulpfr/perlindex-1.502/lib/Text/English.pm



More information about the Corpora mailing list