Corpora: Is corpus phonetics a part of corpus linguistics?

Van den Heuvel M, Mev MVDH at sun.ac.za
Mon Jan 7 06:35:05 UTC 2002


Offhand it may seem strange to use derived orthographic indications to
perform the sort of phonemic calculations we're talking about. However, in
cases where there are no speech corpora at all for the languages you wish to
study, starting out with calculations of phonemic occurence based on
orthographic strings is a very good way of designing corpus content to
elicit the complete phonetic inventory of a language. Based on a normalised
frequency score on a fairly large text (minimum 50 000 words) from the
language, you can determine the frequency and contexts of /p/, and be sure
to include it in in all possible contexts in the items to be collected for a
speech corpus. This way, you can be sure to cover the allophonic variation
of /p/ comprehensively. The usefulness of these sort of calculations are,
however, limited. The interesting corpus phonetics starts when you get your
hands on the actual data that you've collected! :-)

Maritza den Heuvel

***


-----Original Message-----
From: Alex Chengyu Fang [mailto:alex_chengyu at yahoo.co.uk]
Sent: 04 January 2002 16:52
To: Yuri Tambovtsev; corpora at hd.uib.no
Subject: Re: Corpora: Is corpus phonetics a part of corpus linguistics?


By "corpus phonetics", I'd understand it as a study
based on a corpus of "recorded speech", therefore a
rather redundant expression since phonetics is
traditionally much of a field study. You may find
interesting an ICAME article by Haliday which mentions
the pioneering efforts by his teacher Wang Li to
construct a corpus of recorded Cantonese Chinese.

Your own study seems to be one based on derived
indications from authography or transcribed speech
that relate themselves indirectly to phonetic
features. If so, it's certainly part of corpus
linguistics but needs a more self-evident name,
something like "text-based phonetics", which,
admittedly, sounds a bit mutually exclusive.

Alex



More information about the Corpora mailing list