[Corpora-List] How to do Japanese word segmentation using extra term list?

Minh Pham minhpham0902 at gmail.com
Thu Oct 20 08:04:45 UTC 2011


Hi,

Could you please tell us exactly what input is and desired output is?

By the way, after installing mecab tool, in the command line, you can refer
the help of the tool by typing:

mecab.exe --help

The help is in English.

Best regards,
Pham

On Thu, Oct 20, 2011 at 4:22 PM, Adam Kilgarriff <adam at lexmasterclass.com>wrote:

> >  However, since almost of the user manual is in Japanese, I can not
> understand completely.
>
> We have the same problem; are there any English versions anywhere
> (specially for mecab).  Pointers and advice appreciated
>
> Adam
>
> On 20 October 2011 08:08, hf.jiang <hf.jiang at gmail.com> wrote:
>
>> Hi,all
>>
>>     I'm currently trying to process Japanese texts.
>>     Some friends suggest me use Chasen or Mecab.
>>     However, since almost of the user manual is in Japanese, I can not
>> understand completely.
>>     My expectation is that the segmentation tool can recognize the words
>> preferred to my term list.
>>
>>     Note that I have not enough gold data for the training of the tools,
>>  so, the off-the-shelf tool is better for me.
>>
>>     Looking forward to your reply, thanks.
>>
>> -Hongfei Jiang
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
>
> --
> ========================================
> Adam Kilgarriff <http://www.kilgarriff.co.uk/>
> adam at lexmasterclass.com
> Director                                    Lexical Computing Ltd<http://www.sketchengine.co.uk/>
>
> Visiting Research Fellow                 University of Leeds<http://leeds.ac.uk>
>
> *Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
>
>                         *DANTE: a lexical database for English<http://www.webdante.com>
>                   *
> ========================================
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
Pham Quang Nhat Minh (Mr)
PhD student
NLP Laboratory - School of Information Science - JAIST
1-1 Asahidai, Nomi, 923-1292 Japan
Email: minhpqn at jaist.ac.jp
Web: http://www.jaist.ac.jp/index-e.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111020/93f0fc91/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list