11.2532, Sum: Korean Hangul Frequency

The LINGUIST Network linguist at linguistlist.org
Fri Nov 24 17:28:04 UTC 2000


LINGUIST List:  Vol-11-2532. Fri Nov 24 2000. ISSN: 1068-4875.

Subject: 11.2532, Sum: Korean Hangul Frequency

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews: Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Editors: Karen Milligan, Wayne State U. <karen at linguistlist.org>
         Michael Appleby, E. Michigan U. <michael at linguistlist.org>
         Rob Beltz, E. Michigan U. <rob at linguistlist.org>
         Lydia Grebenyova, E. Michigan U. <lydia at linguistlist.org>
         Jody Huellmantel, Wayne State U. <jody at linguistlist.org>
         Marie Klopfenstein, Wayne State U. <marie at linguistlist.org>
	 Naomi Ogasawara, E. Michigan U. <naomi at linguistlist.org>
	 James Yuells, Wayne State U. <james at linguistlist.org>
         Ljuba Veselinova, Stockholm U. <ljuba at linguistlist.org>

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.


Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 20 Nov 2000 09:38:36 -0700
From:  Tim Mills <tmills at zicorp.com>
Subject:  Korean Hangul frequency

-------------------------------- Message 1 -------------------------------

Date:  Mon, 20 Nov 2000 09:38:36 -0700
From:  Tim Mills <tmills at zicorp.com>
Subject:  Korean Hangul frequency

Hello.

Some weeks ago, I posted a request for information on frequency of Hangul
characters in Korean text.  This is a summary of the responses I received.
	In general, the consensus was that such research is rare or
nonexistent, as it has less value in academics than in the commercial
sector, where I work.  However, here are some responses that helped me:

>>From Byong-seon Yang,

There is a published book on Hangul Frequency "Hangul Sayong Bindo-uy
Bunsuk" (An Analysis of Korean Frequncy: Çѱۻç¿ë¹ÝµµÀÇ ºÐ¼®) which is
written in Korean and published in Korea Cultural Reserch Center, Korea
University press, Seoul. The book analyzed by consonant and vowel
(onset, coda), syllable, etc. Unfortuantely it is written in Korean. If
you read Korean, it is useful for you since it is a kind of table
analyed by the number of frequency. The publisher's phone # is
82-2-3290-1610~8, fax: 82-2-926-8385).
If you need more help, please contact me.

-
Byong-seon Yang, Ph.D.
Professor of English, Chair of Korean Studies
Jeonju University
Chonju, Korea 560-759
Tel) 82-63-220-2213 (Office)
     82-63-226-3294 (H)
Fax) 82-63-224-9920
E-mail) bsyang at www.jeonju.ac.kr

>>From Sean M. Witty,

Off the top of my head, I don't think any such documentation exists. If it
does, the numbers must be staggering.
Korean phonology is not as dynamic as that of English. Thus, there are fewer
possible syllables, overall, available to the language (I have compiled a
catalog). The total is further reduced because, although some syllables are
possible according to the phonology, they simply aren't used by the
language. Of those that are phonologically possible and used meaningfully,
the pronunciation may vary depending on the phonetic environment (reducing
the total possible number of syllables even further). The end result is a
5000+ year old language that uses a vocabulary based on a relatively small
number of syllables.
This leads to each syllable having more than one meaning, sometimes as many
as ten (thereby increasing the frequency of each). Take a common syllable
like ? (ka), which has several meanings and is a case marker. The frequency
of usage for this one syllable, either in terms of meaningfulness or daily
usage, would be an extremely high number. This would also probably be true
of almost every other syllable in the language.

>>From Hyeri Joo,

If you're interested in frequencies of Korean words or morphemes, go
to the Web site <kibs.kaist.ac.kr>. The site is still developing, but it
will be very helpful for you since you're a computational linguist.

And the most informative response was from Ivan A. Derzhanski, who sent me
data from his own research on the subject:

My corpus consisted of 1 024 424 syllables' worth
of newspaper text, mostly from the Daily Hankyoreh.  There were
1526 different syllables found in the text, of the 2350 the KSC
code caters for.

Derzhanski's data includes counts for how many times each Hangul appeared in
his corpus, as well as counts on onset, nucleus, and coda jamo.  I include
his signature information here in case anyone wishes to contact him about
the data:

-
<fa-al-_haylu wa-al-laylu wa-al-baydA'u ta`rifunI
 wa-as-sayfu wa-ar-rum.hu wa-al-qir.tAsu wa-al-qalamu>
                       (Abu t-Tayyib Ahmad Ibn Hussayn al-Mutanabbi)
Ivan A Derzhanski                      http://www.math.bas.bg/~iad/
H: cplx Iztok bl 91, 1113 Sofia, Bulgaria          <iad at math.bas.bg>
W: Dept for Math Lx, Inst for Maths & CompSci, Bulg Acad of Sciences

Thanks to everyone who responed to my posting, and thanks especially to Ivan
Derzhanski for sharing his data.

	- Tim Mills -
	Zi Corporation

- --------------------------------------------

Tim Mills, Computational Linguist
Zi Corporation
Suite 300, 500 - 4 Avenue SW
Calgary, Alberta
Canada T2P 2V6

Main:  (403) 233.8875
Direct:  (403) 231.4591
Fax:  (403) 231.4595
E-mail:  tmills at zicorp.com
Website:  www.zicorp.com

---------------------------------------------------------------------------
LINGUIST List: Vol-11-2532



More information about the LINGUIST mailing list