[Corpora-List] Language independent stemmer

Yannick Versley yversley at gmail.com
Fri Aug 22 09:29:33 UTC 2014


Dear Matías,

have you looked at Morfette
https://sites.google.com/site/morfetteweb/home
it works quite well for data-driven lemmatization.

The lemmatizer from the mate-tools
https://code.google.com/p/mate-tools/
could also work for you.

(Both of these tools need a corpus but not a dictionary; If you have a
dictionary but no corpus,
you could try something like Morfessor, or try to induce a morphology
lexicon from Wiktionary)

Best wishes,
Yannick


On Fri, Aug 22, 2014 at 11:12 AM, Matías Guzmán Naranjo <
mortem.dei at gmail.com> wrote:

> Dear all,
>
> I need a language agnostic stemmer that can handle concatenative
> morphology for non polysinthetic langauges. Is there anything like that?
> Only thing I could find was a patent but it looked fishy and I didn't see
> any demo of an implementation.
>
> I basically need to match inflected words in a natural text to a base form
> in a dictionary, so I had also thought of going the other way around,
> removing first regular morphemes (things like infinitive markers or
> nominative declension markers or whatever) from the dictionary entries and
> then trying to find for each word in the corpus the closet matched in the
> stemmed entries. I really don't know if anyone has tried this and how well
> (or poorly) it performs. Does anyone know?
>
> Thanks!
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140822/16489c2a/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list