Fwd: [Corpora-List] Application for lemmatising corpora
Grzegorz Chrupała
grzegorz at pithekos.net
Fri Mar 23 17:44:18 UTC 2007
Something like the following Ruby script would do this (where one line
in the file with lemmas looks like this: "GO went gone going"):
#!/usr/bin/ruby
def read_dict(path)
f = File.open(path)
dict = Hash.new
while line = f.gets
words = line.split
lemma = words.shift
words.each do|w| dict[w]=lemma end
end
return dict
end
def lemmatize(dict,inp)
while line = inp.gets
puts( line.split.map do|w| dict[w] || w end.join(' ') )
end
end
lemmatize(read_dict(ARGV[0]),STDIN)
On 3/23/07, Hunter, Duncan <D.I.Hunter at warwick.ac.uk> wrote:
>
>
>
> Hi all,
>
> Thanks, I have been looking at the applications suggested. Unfortunately,
> what I'm looking for is so simple that it might not be something that many
> people actually use. My texts are untagged, and I'd like to keep them that
> way for the moment. I actually want the lemmas to be inserted right there in
> the text, so you get for example; 'Yesterday I GO to the market.'
>
> I guess what I'm looking for is a kind of find/replace application that can
> read off a file of (lemmatising) replacements like GO>go, went, gone,
> going...!
>
> Apologies for not making this clearer!
>
> Duncan Hunter
>
--
'gʒɛgɔʃ
More information about the Corpora
mailing list