Fwd: [Corpora-List] Application for lemmatising corpora

Matthew Purver mpurver at stanford.edu
Fri Mar 23 18:29:41 UTC 2007


And if you're looking for a file with those lemmas in, I once produced a 
very similar one for English from the Oxford Advanced Learner's 
Dictionary - it's available here:

http://www.stanford.edu/~mpurver/software.html

Grzegorz Chrupała wrote:
> Something like the following Ruby script would do this (where one line
> in the file with lemmas looks like this: "GO went gone going"):
> 
> #!/usr/bin/ruby
> 
> def read_dict(path)
>    f = File.open(path)
>    dict = Hash.new
>    while line = f.gets
>        words = line.split
>        lemma = words.shift
>        words.each do|w| dict[w]=lemma end
>    end
>    return dict
> end
> 
> def lemmatize(dict,inp)
>    while line = inp.gets
>        puts( line.split.map do|w| dict[w] || w end.join(' ') )
>    end
> end
> 
> lemmatize(read_dict(ARGV[0]),STDIN)
> 
> 
> On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:
>>
>>
>>
>> Hi all,
>>
>> Thanks, I  have been looking at the applications suggested. 
>> Unfortunately,
>> what I'm looking for is so simple that it might not be something that 
>> many
>> people actually use. My texts are untagged, and I'd like to keep them 
>> that
>> way for the moment. I actually want the lemmas to be inserted right 
>> there in
>> the text, so you get for example; 'Yesterday I GO to the market.'
>>
>> I guess what I'm looking for is a kind of find/replace application 
>> that can
>> read off a file of (lemmatising) replacements like GO>go, went, gone,
>> going...!
>>
>> Apologies for not making this clearer!
>>
>> Duncan Hunter
>>
> 
> -- 
> 'gʒɛgɔʃ

-- 
Matthew Purver <mpurver at stanford.edu>
Computational Semantics Laboratory, CSLI, Stanford



More information about the Corpora mailing list