[An-lang] AN corpora

Nicholas Thieberger thien at unimelb.edu.au
Tue Jun 1 06:37:10 UTC 2004


Further to this discussion of corpora is the issue of how and where
these data sets are kept and how easy they are to obtain. Typically,
you ask the creator, if you know how to contact that person and you
know the data exists.

Further, what happens to these resources when the creator retires?

To assist in the controlled access to and the discovery and storage
of these kinds of resources, colleagues at the Universities of
Sydney, Melbourne and the ANU have established a digital archive
called PARADISEC (Pacific And Regional Archive for Digital Sources in
Endangered Cultures). This project has already digitsed over 500
hours of audio tapes that were located  in filing drawers in the
Coombs building at the ANU.

The focus of the project in its first year has been digitising audio
tapes, but we would also like to include theses, manuscripts,
wordlists, elicitation aids and so on. In short, the archive is a
resource to facilitate research, subject to normal copyright and
deposit conditions.

We are asking the community of scholars on AN-LANG to deposit in the
archive any digital material relevant to the list members. Deposit
and access forms can be found on our website, as can further details
about the project.

For any further details please contact me or visit our website.


Nick Thieberger

PARADISEC Project Manager
nicholas.thieberger at paradisec.org.au
http://paradisec.org.au

At 12:03 PM +1000 31/5/04, Andy Pawley wrote:
>In response to Ross Clark's note, there is at least one electronic
>corpus of  Samoan with frequency analysis. This was compiled by
>Galumalemana Alfred Hunkin for his 2001 MA thesis: A Corpus of
>Contemporary Colloquial Samoan, in the School of Linguistics and
>Applied Linguistics, Victoria University of Wellington.  The corpus
>consists of about 300,000 words, made up of 300 samples spoken and
>written Samoan. Mr Hunkin <Alfred.Hunkin at vuw.ac.nz> teaches Samoan
>at Victoria U. Wellington.
>
>Andy Pawley
>
>>Someone asked me whether there are word frequency statistics available for
>>Samoan, such as exist for English and other big languages. I think probably
>>not, and further it occurred to me that such statistics depend on a corpus
>>of the language in question -- nowadays assumed to be computer-searchable.
>>Corpus linguistics seems to be pretty trendy in English right now. But I
>>wonder whether there are comparable bodies of text for any Austronesian
>>languages? At one time the Maori Studies people here had at least the
>>beginnings of one, and I believe the Maori Newspapers project aims
>>eventually to have a searchable online corpus. Any other news?
>>
>>Ross Clark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/an-lang/attachments/20040601/00660c94/attachment.htm>
-------------- next part --------------
_______________________________________________
An-lang mailing list
An-lang at anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/an-lang


More information about the An-lang mailing list