[Corpora-List] C-ORAL-ROM spoken corpus
Jean Veronis
Jean.Veronis at up.univ-mrs.fr
Thu Jan 20 20:47:37 UTC 2005
The C-ORAL-ROM corpus is available at ELRA/ELDA.
C-ORAL-ROM is a multilingual corpus of spontaneous speech for four
romance languages (French, Italian, Portuguese, Spanish) of around
1,200,000 words (IST 2000-26228). The corpus consists of four
comparable recording collections of Italian, French, Portuguese and
Spanish spontaneous speech sessions (around 300,000 words for each
Language). The collections are delivered respectively by the following
* Università di Firenze (Dipartimento di Italianistica, LABLITA);
* Université de Provence (DELIC team, Description Linguistique
Informatisée sur Corpus);
* Fundação da Universidade de Lisboa/Centro de Linguística da
Universidade de Lisboa
* Universidad Autónoma de Madrid (Departamento de Lingüística,
Lenguas Modernas, Lógica y F. de la Ciencia, Laboratorio de
Lingüística Informática).
The C-ORAL-ROM corpus provides the acoustic source of each session
together with the following main annotations:
* The orthographic transcription, in CHAT format, enriched with the
tagging of terminal and non terminal prosodic breaks
* Session metadata
* The text to speech synchronization, in WIN PITCH CORPUS format,
based on the alignment of each transcribed utterance. The WIN
PITCH CORPUS software is provided with the ressource.
More details in the ELRA/ELDA Catalogue:
Jean Véronis
Home: http://www.up.univ-mrs.fr/veronis
Blog: http://aixtal.blogspot.com
More information about the Corpora
mailing list