7.1767, Sum: Asian Word Frequency

linguist at linguistlist.org linguist at linguistlist.org
Fri Dec 13 13:48:10 UTC 1996


LINGUIST List:  Vol-7-1767. Fri Dec 13 1996. ISSN: 1068-4875.
 
Subject: 7.1767, Sum: Asian Word Frequency
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>
 
Review Editor:     Andrew Carnie <carnie at linguistlist.org>
 
Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>
Technical Editor:  Ron Reck <ron at linguistlist.org>
 
Software development: John H. Remmers <remmers at emunix.emich.edu>
 
Editor for this issue: T. Daniel Seely <seely at linguistlist.org>
 
=================================Directory=================================
 
1)
Date:  Fri, 13 Dec 1996 09:12:01 -0700
From:  Peter.Ross at anu.edu.au
Subject:   Asian Word Frequency
 
-------------------------------- Message 1 -------------------------------
 
Date:  Fri, 13 Dec 1996 09:12:01 -0700
From:  Peter.Ross at anu.edu.au
Subject:   Asian Word Frequency
 
 
Recently I posted a query for information on Word Frequency for
East/Southeast Asian languages. Following is a summary of responses
received (on Chinese) as well as some additional information from enquiries
off the list (on Thai). There is still a lot of work to be done here. Word
segmentation has been mentioned as a key issue for both Chinese and Thai.
 
MANDARIN CHINESE:
 
1. Beijing Yuyan Xueyuan Yuyan Jiaoxue Yanjiu Suo. 1986. Xiandai Hanyu
Pinlu Cidian [Modern Chinese Frequency Dictionary]. Beijing:Beijing Yuyan
Xueyuan Chubanshe.  [Beijing Institute of Language Press].
 
2. "Dictionary of Usage Frequency of Modern Chinese Words", 1990 Beijing
University of Aeronauties and Astronautics Press.
 
3. Linguistic Data Consortium (http://www.ldc.upenn.edu) Corpus of Mandarin
conversational speech. 100 30-minute, telephone conversations, 10 minutes
each conversation, transcribed. Soon be published on CD-ROM. In principle,
this corpus enables estimation of word frequencies for contemporary
conversational Mandarin.
 
4. There are several automatically- or semi-automatically segmented
Mandarin text corpora around, eg Academia Sinica, Taiwan.
 
THAI
 
1. Yuen Poovoravan, Kasetsart University has done some preliminary research
on Thai word frequency.
 
2. Some basic statistics (grammatical categorie)
<http://www.links.nectec.or.th>
<http://tanaka-www.cs.titech.ac.jp/~virach/profile.html>
 
3. Virach Sornlertlamvanich <virach at cs.titech.ac.jp> is involved in NLP
research for Thai. Results are not yet available.
 
Respnses provided by:
Xiaolin Zhou <zhou at psychology.bbk.ac.uk>
Phillip Elliot <FSKD94A at prodigy.com>
Mark Liberman <myl at unagi.cis.upenn.edu>
Hugh Thaweesak Koanantakool <htk at nectec.or.th>
Thatsanee Charoenporn <thatsc at nwg.nectec.or.th>
Virach Sornlertlamvanich <virach at cs.titech.ac.jp>
 
Peter Ross
Thai/Linguistics
Australian National University
 
---------------------------------------------------------------------------
LINGUIST List: Vol-7-1767



More information about the LINGUIST mailing list