Fwd: [Corpora-List] Application for lemmatising corpora
Matthew Purver
mpurver at stanford.edu
Fri Mar 23 18:29:41 UTC 2007
And if you're looking for a file with those lemmas in, I once produced a
very similar one for English from the Oxford Advanced Learner's
Dictionary - it's available here:
http://www.stanford.edu/~mpurver/software.html
Grzegorz Chrupała wrote:
> Something like the following Ruby script would do this (where one line
> in the file with lemmas looks like this: "GO went gone going"):
>
> #!/usr/bin/ruby
>
> def read_dict(path)
> f = File.open(path)
> dict = Hash.new
> while line = f.gets
> words = line.split
> lemma = words.shift
> words.each do|w| dict[w]=lemma end
> end
> return dict
> end
>
> def lemmatize(dict,inp)
> while line = inp.gets
> puts( line.split.map do|w| dict[w] || w end.join(' ') )
> end
> end
>
> lemmatize(read_dict(ARGV[0]),STDIN)
>
>
> On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:
>>
>>
>>
>> Hi all,
>>
>> Thanks, I have been looking at the applications suggested.
>> Unfortunately,
>> what I'm looking for is so simple that it might not be something that
>> many
>> people actually use. My texts are untagged, and I'd like to keep them
>> that
>> way for the moment. I actually want the lemmas to be inserted right
>> there in
>> the text, so you get for example; 'Yesterday I GO to the market.'
>>
>> I guess what I'm looking for is a kind of find/replace application
>> that can
>> read off a file of (lemmatising) replacements like GO>go, went, gone,
>> going...!
>>
>> Apologies for not making this clearer!
>>
>> Duncan Hunter
>>
>
> --
> 'gʒɛgɔʃ
--
Matthew Purver <mpurver at stanford.edu>
Computational Semantics Laboratory, CSLI, Stanford
More information about the Corpora
mailing list