[Corpora-List] Korean and Japanese stemming

Pontus Stenetorp pontus at stenetorp.se
Fri Mar 2 09:37:08 UTC 2012


Dear Stefan (and everyone else),

To the best of my knowledge there is no widely adopted stemming
algorithm for Japanese, by intuition (dangerous) it does seem that
making a stemming algorithm for Japanese would be less involved than
for English since the morphology is fairly straight forward with fewer
exceptions.

If you want something "off-the-rack" I highly recommend "MeCab: Yet
Another Part-of-Speech and Morphological Analyzer":

    http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html

It gives you base form, reading, pronunciation and more. I personally
have only used it for word segmentation but I think it sounds like a
reasonable fit for your purposes.

Best regards,
    Pontus Stenetorp
    University of Tokyo, Tokyo, Japan

On 2 March 2012 18:16, Stefan Bordag <sbordag at informatik.uni-leipzig.de> wrote:
> Dear all,
>
> Does anyone know whether someone wrote a simple Porter-stemmer or similar
> set of rules for stemming korean texts? Same for Japanese texts. It doesn't
> need to be anything fancy. But using google translate and search engine
> results turns out to not lead anywhere, or I am looking in the wrong places.
>
> Thank you very much in advance,
> Stefan Bordag
>
> --
> --
> ---------------------------------------------
> - Dr. Stefan Bordag                         -
> - 0341 49 26 196                            -
> - sbordag at informatik.uni-leipzig.de         -
> ---------------------------------------------
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list