25.2588, FYI: Chinese Spoken Wordlist Database

The LINGUIST List linguist at linguistlist.org
Tue Jun 17 09:17:38 UTC 2014


LINGUIST List: Vol-25-2588. Tue Jun 17 2014. ISSN: 1069 - 4875.

Subject: 25.2588, FYI: Chinese Spoken Wordlist Database

Moderators: Damir Cavar, Eastern Michigan U <damir at linguistlist.org>
            Malgorzata E. Cavar, Eastern Michigan U <gosia at linguistlist.org>

Reviews: reviews at linguistlist.org
Anthony Aristar <aristar at linguistlist.org>
Helen Aristar-Dry <hdry at linguistlist.org>
Mateja Schuck, U of Wisconsin Madison

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Uliana Kazagasheva <uliana at linguistlist.org>
================================================================  


Date: Tue, 17 Jun 2014 05:16:58
From: Shu-Chuan Tseng [tsengsc at gate.sinica.edu.tw]
Subject: Chinese Spoken Wordlist Database

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=25-2588.html&submissionid=34212359&topicid=6&msgnumber=1
 
The ''Chinese Spoken Wordlist'' was derived from the transcripts of 85 Taiwan Mandarin conversations collected and processed at Academia Sinica, with a total of 42 hours of speech recording. The recording took place from 2001 to 2003 and the speakers' age ranged from 14 to 63. The transcripts were automatically processed by the CKIP word segmentation and POS tagging system. The results of word segmentation, POS tagging, and character-Pinyin conversion as well as homographs were then manually corrected and edited. As a result, the wordlist consists of 16,683 word types and 405,435 word tokens, equivalent to 607,016 syllables.

To access the ''Chinese Spoken Wordlist'' please see: 
http://mmc.sinica.edu.tw/resources_e_02.html 



Linguistic Field(s): Computational Linguistics
                     Language Acquisition

Subject Language(s): Chinese, Mandarin (cmn)





 






----------------------------------------------------------
LINGUIST List: Vol-25-2588	
----------------------------------------------------------



More information about the LINGUIST mailing list