<!doctype html public "-//W3C//DTD W3 HTML//EN">
<html><head><style type="text/css"><!--
blockquote, dl, ul, ol, li { margin-top: 0 ; margin-bottom: 0 }
--></style><title>Re: [An-lang] AN corpora</title></head><body>
<div>In response to Ross Clark's note, there is at least one
electronic corpus of Samoan with frequency analysis. This was
compiled by Galumalemana Alfred Hunkin for his 2001 MA thesis:<i> A
Corpus of Contemporary Colloquial Samoan</i>, in the School of
Linguistics and Applied Linguistics, Victoria University of
Wellington. The corpus consists of about 300,000 words, made up
of 300 samples spoken and written Samoan. Mr Hunkin
<Alfred.Hunkin@vuw.ac.nz> teaches Samoan at Victoria U.
<div>Andy Pawley</div>
<blockquote type="cite" cite>Someone asked me whether there are word
frequency statistics available for<br>
Samoan, such as exist for English and other big languages. I think
not, and further it occurred to me that such statistics depend on a
of the language in question -- nowadays assumed to be
Corpus linguistics seems to be pretty trendy in English right now. But
wonder whether there are comparable bodies of text for any
languages? At one time the Maori Studies people here had at least
beginnings of one, and I believe the Maori Newspapers project aims<br>
eventually to have a searchable online corpus. Any other news?<br>
Ross Clark<br>
An-lang mailing list<br>