[Corpora-List] Language independent stemmer

Matías Guzmán Naranjo mortem.dei at gmail.com
Fri Aug 22 09:33:49 UTC 2014


Thank you Yannick, I'll take a look. I do have a corpus and dictionary.


2014-08-22 11:29 GMT+02:00 Yannick Versley <yversley at gmail.com>:

> Dear Matías,
>
> have you looked at Morfette
> https://sites.google.com/site/morfetteweb/home
> it works quite well for data-driven lemmatization.
>
> The lemmatizer from the mate-tools
> https://code.google.com/p/mate-tools/
> could also work for you.
>
> (Both of these tools need a corpus but not a dictionary; If you have a
> dictionary but no corpus,
> you could try something like Morfessor, or try to induce a morphology
> lexicon from Wiktionary)
>
> Best wishes,
> Yannick
>
>
> On Fri, Aug 22, 2014 at 11:12 AM, Matías Guzmán Naranjo <
> mortem.dei at gmail.com> wrote:
>
>> Dear all,
>>
>> I need a language agnostic stemmer that can handle concatenative
>> morphology for non polysinthetic langauges. Is there anything like that?
>> Only thing I could find was a patent but it looked fishy and I didn't see
>> any demo of an implementation.
>>
>> I basically need to match inflected words in a natural text to a base
>> form in a dictionary, so I had also thought of going the other way around,
>> removing first regular morphemes (things like infinitive markers or
>> nominative declension markers or whatever) from the dictionary entries and
>> then trying to find for each word in the corpus the closet matched in the
>> stemmed entries. I really don't know if anyone has tried this and how well
>> (or poorly) it performs. Does anyone know?
>>
>> Thanks!
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140822/8b06259f/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list