[Corpora-List] How to do Japanese word segmentation using extraterm list?
hf.jiang
hf.jiang at gmail.com
Fri Oct 21 08:43:45 UTC 2011
Thanks Pham.
I have found the solution.
The manual page (http://mecab.sourceforge.net/dic.html) includes what I need.
And I have asked one of my friend who knows Japanese to explain to me.
Wish my English be better, then I can supply colleagues an English version of the manual.
-Hongfei Jiang
------------------ Original ------------------
From: "Minh Pham"<minhpham0902 at gmail.com>;
Date: Thu, Oct 20, 2011 04:04 PM
To: "Adam Kilgarriff"<adam at lexmasterclass.com>;
Cc: "hf.jiang"<hf.jiang at gmail.com>; "corpora"<corpora at uib.no>; "Hiram Calvo"<hiramcalvo at gmail.com>; "Jan Pomikále"<xpomikal at fi.muni.cz>;
Subject: Re: [Corpora-List] How to do Japanese word segmentation using extraterm list?
Hi,
Could you please tell us exactly what input is and desired output is?
By the way, after installing mecab tool, in the command line, you can refer the help of the tool by typing:
mecab.exe --help
The help is in English.
Best regards,
Pham
On Thu, Oct 20, 2011 at 4:22 PM, Adam Kilgarriff <adam at lexmasterclass.com> wrote:
> However, since almost of the user manual is in Japanese, I can not understand completely.
We have the same problem; are there any English versions anywhere (specially for mecab). Pointers and advice appreciated
Adam
On 20 October 2011 08:08, hf.jiang <hf.jiang at gmail.com> wrote:
Hi,all
I'm currently trying to process Japanese texts.
Some friends suggest me use Chasen or Mecab.
However, since almost of the user manual is in Japanese, I can not understand completely.
My expectation is that the segmentation tool can recognize the words preferred to my term list.
Note that I have not enough gold data for the training of the tools, so, the off-the-shelf tool is better for me.
Looking forward to your reply, thanks.
-Hongfei Jiang
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
--
========================================
Adam Kilgarriff adam at lexmasterclass.com
Director Lexical Computing Ltd
Visiting Research Fellow University of Leeds Corpora for all with the Sketch Engine
DANTE: a lexical database for English ========================================
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
--
Pham Quang Nhat Minh (Mr)
PhD student
NLP Laboratory - School of Information Science - JAIST
1-1 Asahidai, Nomi, 923-1292 Japan
Email: minhpqn at jaist.ac.jp
Web: http://www.jaist.ac.jp/index-e.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111021/54d87a41/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list