[Corpora-List] Language independent stemmer

Matías Guzmán Naranjo mortem.dei at gmail.com
Fri Aug 22 09:12:56 UTC 2014


Dear all,

I need a language agnostic stemmer that can handle concatenative morphology
for non polysinthetic langauges. Is there anything like that? Only thing I
could find was a patent but it looked fishy and I didn't see any demo of an
implementation.

I basically need to match inflected words in a natural text to a base form
in a dictionary, so I had also thought of going the other way around,
removing first regular morphemes (things like infinitive markers or
nominative declension markers or whatever) from the dictionary entries and
then trying to find for each word in the corpus the closet matched in the
stemmed entries. I really don't know if anyone has tried this and how well
(or poorly) it performs. Does anyone know?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140822/aac52386/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list