23.2763, FYI: Taiwan Mandarin Spoken Wordlist

linguist at linguistlist.org linguist at linguistlist.org
Tue Jun 19 14:37:31 UTC 2012


LINGUIST List: Vol-23-2763. Tue Jun 19 2012. ISSN: 1069 - 4875.

Subject: 23.2763, FYI: Taiwan Mandarin Spoken Wordlist

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Brent Miller <brent at linguistlist.org>
================================================================  


Date: Tue, 19 Jun 2012 10:37:27
From: Shu-Chuan Tseng [tsengsc at gate.sinica.edu.tw]
Subject: Taiwan Mandarin Spoken Wordlist

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-2763.html&submissionid=4548423&topicid=6&msgnumber=1
 
The ''Taiwan Mandarin Spoken Wordlist'' was derived from the 
transcripts of 85 Taiwan Mandarin conversations collected and 
processed at Academia Sinica, with a total of 42 hours of speech 
recording. The recording took place from 2001 to 2003 and the 
speakers' age ranged from 14 to 63. The transcripts were automatically 
processed by the CKIP word segmentation and POS tagging system. 
The results of word segmentation, POS tagging, and character-Pinyin 
conversion as well as homographs were then manually corrected and 
edited. As a result, the wordlist consists of 16,683 word types and 
405,435 word tokens, equivalent to 607,016 syllables.

The Wordlist can be downloaded at

http://mmc.sinica.edu.tw/resources_e_01.htm 



Linguistic Field(s): Text/Corpus Linguistics





 






----------------------------------------------------------
LINGUIST List: Vol-23-2763	
----------------------------------------------------------



More information about the LINGUIST mailing list