[Corpora-List] Language independent stemmer

Djamé Seddah djame.seddah at free.fr
Fri Aug 22 20:29:51 UTC 2014


dear all,
actually if you add an external lexicon to morfette, it will greatly improve the unkwown word accuracy.

Best,
Djamé


Le 22 août 2014 à 11:29, Yannick Versley a écrit :

> Dear Matías,
> 
> have you looked at Morfette
> https://sites.google.com/site/morfetteweb/home
> it works quite well for data-driven lemmatization.
> 
> The lemmatizer from the mate-tools
> https://code.google.com/p/mate-tools/
> could also work for you.
> 
> (Both of these tools need a corpus but not a dictionary; If you have a dictionary but no corpus,
> you could try something like Morfessor, or try to induce a morphology lexicon from Wiktionary)
> 
> Best wishes,
> Yannick
> 
> 
> On Fri, Aug 22, 2014 at 11:12 AM, Matías Guzmán Naranjo <mortem.dei at gmail.com> wrote:
> Dear all,
> 
> I need a language agnostic stemmer that can handle concatenative morphology for non polysinthetic langauges. Is there anything like that? Only thing I could find was a patent but it looked fishy and I didn't see any demo of an implementation.
> 
> I basically need to match inflected words in a natural text to a base form in a dictionary, so I had also thought of going the other way around, removing first regular morphemes (things like infinitive markers or nominative declension markers or whatever) from the dictionary entries and then trying to find for each word in the corpus the closet matched in the stemmed entries. I really don't know if anyone has tried this and how well (or poorly) it performs. Does anyone know?
> 
> Thanks!
> 
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140822/cfc6c496/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list