[Corpora-List] How to do Japanese word segmentation using extraterm list?

hf.jiang hf.jiang at gmail.com
Fri Oct 21 08:43:45 UTC 2011


Thanks Pham.


I have found the solution.
The manual page (http://mecab.sourceforge.net/dic.html) includes what I need.
And I have asked one of my friend who knows Japanese to explain to me.


Wish my English be better, then I can supply colleagues an English version of the manual.


-Hongfei Jiang
 
 
------------------ Original ------------------
From:  "Minh Pham"<minhpham0902 at gmail.com>;
Date:  Thu, Oct 20, 2011 04:04 PM
To:  "Adam Kilgarriff"<adam at lexmasterclass.com>; 
Cc:  "hf.jiang"<hf.jiang at gmail.com>; "corpora"<corpora at uib.no>; "Hiram Calvo"<hiramcalvo at gmail.com>; "Jan Pomikále"<xpomikal at fi.muni.cz>; 
Subject:  Re: [Corpora-List] How to do Japanese word segmentation using extraterm list?

 
Hi,

Could you please tell us exactly what input is and desired output is?


By the way, after installing mecab tool, in the command line, you can refer the help of the tool by typing:
 

mecab.exe --help


The help is in English.


Best regards,
Pham

On Thu, Oct 20, 2011 at 4:22 PM, Adam Kilgarriff <adam at lexmasterclass.com> wrote:
 >  However, since almost of the user manual is in Japanese, I can not understand completely.
 

We have the same problem; are there any English versions anywhere (specially for mecab).  Pointers and advice appreciated
 

Adam


On 20 October 2011 08:08, hf.jiang <hf.jiang at gmail.com> wrote:
 


 Hi,all


    I'm currently trying to process Japanese texts.
    Some friends suggest me use Chasen or Mecab.
    However, since almost of the user manual is in Japanese, I can not understand completely.
     My expectation is that the segmentation tool can recognize the words preferred to my term list.
    
    Note that I have not enough gold data for the training of the tools,  so, the off-the-shelf tool is better for me.
 

    Looking forward to your reply, thanks.


-Hongfei Jiang



_______________________________________________
 UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
 Corpora mailing list
 Corpora at uib.no
 http://mailman.uib.no/listinfo/corpora
 





-- 
========================================
Adam Kilgarriff                  adam at lexmasterclass.com                                             
 Director                                    Lexical Computing Ltd                
Visiting Research Fellow                 University of Leeds      Corpora for all with the Sketch Engine                 
                        DANTE: a lexical database for English                   ========================================


 
 
_______________________________________________
 UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
 Corpora mailing list
 Corpora at uib.no
 http://mailman.uib.no/listinfo/corpora
 





-- 
Pham Quang Nhat Minh (Mr)
PhD student
NLP Laboratory - School of Information Science - JAIST
1-1 Asahidai, Nomi, 923-1292 Japan
Email: minhpqn at jaist.ac.jp
 Web: http://www.jaist.ac.jp/index-e.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111021/54d87a41/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list