[Corpora-List] Query about corpora of spoken English

Paul Thompson p.a.thompson at reading.ac.uk
Sun Dec 4 12:34:35 UTC 2005


Briony Williams mentions the MARSEC corpus and gives a Reading URL for 
information on this. That page is well and truly out of date now, and 
Simon Arnfield, whose name is given as the contact person, no longer 
works at Reading University.

Reading doesn't have any of the MARSEC resources - can anyone (maybe 
someone at Leeds or Lancaster) tell the list what the current state of 
MARSEC is?

Paul Thompson

Briony Williams wrote:

> R.M.Salkie at bton.ac.uk wrote:
>
>> My colleague Nicolas Ballier (nicolas.ballier at lli.univ-paris13.fr
>> <mailto:nicolas.ballier at lli.univ-paris13.fr> ) has asked me to post the
>> following two queries.  Please reply directly to him. 
>
>
> It may be useful to others to have the replies in a public forum like 
> this one - so here is a quick reply to the CORPORA list.
>
>> 1.      Is there a web page which lists currently available corpora of
>> spoken English (eg MARSEC MAchine REadable Spoken ENglish Corpus), 
>> stating
>> whether the sound files are available?
>
>
> You could try the catalogue pages of:-
>
> a)  Linguistic Data Consortium - subset "speech"-
> http://www.ldc.upenn.edu/Catalog/byType.jsp#speech
>
> b) Evaluations and Language Resources DIstribution Agency -
> http://www.elda.org/rubrique6.html
>
> c) International Computer Archive of Modern and Medieval English
> http://nora.hd.uib.no/whatis.html
>
> d) The MARSEC corpus
> http://www.rdg.ac.uk/AcaDepts/ll/speechlab/marsec/
>
>> 2.      Is there software available to align texts and sound files: for
>> example, software that enables the user to listen to any part of the
>> document by clicking on a word in the text?
>
>
> First the soundfile needs to be aligned with the linguistic 
> annotation.  Some popular applications currently used for doing this 
> manually are the following (there are other applications for automatic 
> segmentation of speech files). All of these can be used to click on 
> and listen to an individual word once a word-level segmentation has 
> been carried out.
>
> a)  Praat (has a very flexible scripting language):
> http://www.fon.hum.uva.nl/praat/
>
> b)  Emu (segment-level and also higher linguistic levels, plus 
> hierarchical structure: has some scripting capability for automatic 
> building of trees):
> http://emu.sourceforge.net/
>
> c) Transcriber ("It provides a user-friendly graphical user interface 
> for segmenting long duration speech recordings, transcribing them, and 
> labeling speech turns, topic changes and acoustic conditions. It is 
> more specifically designed for the annotation of broadcast news 
> recordings, for creating corpora used in the development of automatic 
> broadcast news transcription systems, but its features might be found 
> useful in other areas of speech research.")
> http://trans.sourceforge.net/en/presentation.php
>
> d) MATE workbench ("a program designed to aid in the display, editing 
> and querying of annotated speech corpora")
> http://www.cogsci.ed.ac.uk/~dmck/MateCode/
>
>
> These are by no means the only tools available (I have omitted xlabel, 
> as it is no longer supported).
>
> Best regards
>
> Briony Williams
>

-- 
***************************************
Dr Paul Thompson
School Director of Postgraduate Studies
Department of Applied Linguistics
School of Languages and European Studies
The University of Reading
Whiteknights
Reading RG6 6AA
Tel. +44 118 3786472
URL: www.rdg.ac.uk/app_ling/
***************************************



More information about the Corpora mailing list