Fwd: [Corpora-List] Application for lemmatising corpora

Grzegorz Chrupała grzegorz at pithekos.net
Fri Mar 23 17:44:18 UTC 2007


Something like the following Ruby script would do this (where one line
in the file with lemmas looks like this: "GO went gone going"):

#!/usr/bin/ruby

def read_dict(path)
    f = File.open(path)
    dict = Hash.new
    while line = f.gets
        words = line.split
        lemma = words.shift
        words.each do|w| dict[w]=lemma end
    end
    return dict
end

def lemmatize(dict,inp)
    while line = inp.gets
        puts( line.split.map do|w| dict[w] || w end.join(' ') )
    end
end

lemmatize(read_dict(ARGV[0]),STDIN)


On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:
>
>
>
> Hi all,
>
> Thanks, I  have been looking at the applications suggested. Unfortunately,
> what I'm looking for is so simple that it might not be something that many
> people actually use. My texts are untagged, and I'd like to keep them that
> way for the moment. I actually want the lemmas to be inserted right there in
> the text, so you get for example; 'Yesterday I GO to the market.'
>
> I guess what I'm looking for is a kind of find/replace application that can
> read off a file of (lemmatising) replacements like GO>go, went, gone,
> going...!
>
> Apologies for not making this clearer!
>
> Duncan Hunter
>

--
'gʒɛgɔʃ



More information about the Corpora mailing list