[Corpora-List] Simple tokenizer for Chinese

Hongyin Tao bbs.lists at gmail.com
Mon Oct 27 18:34:52 UTC 2008


One of the best tokenizers is ICTCLAS by researchers from the Chinese
Academy of Sciences.

http://www.ictclas.org/

If you have more questions regarding Chinese corpora and corpus tools, visit

http://www.corpus4u.org


Hongyin Tao
Professor of Chinese Language and Linguistics
& Applied Linguistics and TESL
University of California, Los Angeles (UCLA)
Department of Asian Languages and Cultures
290 Royce Hall
Los Angeles, CA 90095-1540
Tel: (310) 206-6872
Fax: (310) 825-8808


On Mon, Oct 27, 2008 at 3:04 AM, Emiliano Guevara <emiliano.guevara at unibo.it
> wrote:

> Dear all,
>
> could you please suggest me pointers to simple tokenizers for Chinese
> text corpora?
> It will be used by a student with very basic background, so standalone
> or GUI options would be preferred.
>
> Thanks in advance,
>
> E.
>
>
>
>
> ************************************************************************
> Emiliano R. Guevara
> Facoltà di Lingue e Lett. Straniere - Dip. di Lingue e Lett. Straniere
> Università di Bologna - Via Cartoleria 5 (40124) Bologna, Italia
>   http://morbo.lingue.unibo.it/
>   emiliano.guevara at unibo.it  -  emiguevara at gmail.com
> ************************************************************************
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20081027/90e7b253/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list