15.3192, Sum: Phonetically Transcribed Italian Corpus

Sat Nov 13 21:24:08 UTC 2004

LINGUIST List: Vol-15-3192. Sat Nov 13 2004. ISSN: 1068 - 4875.

Subject: 15.3192, Sum: Phonetically Transcribed Italian Corpus                                                                                                                                                                 

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org) 
        Sheila Collberg, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Jessica Boynton <jessica at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 13-Nov-2004
From: Christina Villafana < cmv2 at georgetown.edu >
Subject: Phonetically Transcribed Italian Corpus 

-------------------------Message 1 ---------------------------------- 
Date: Sat, 13 Nov 2004 16:17:19
From: Christina Villafana < cmv2 at georgetown.edu >
Subject: Phonetically Transcribed Italian Corpus 

Regarding query http://www.linguistlist.org/issues/15/15-3137.html

Thanks to the following people who responded to my recent query:

Caren Brinckmann, Saarland University
Federico Albano Leoni, Universita' di Napoli
Kristie McCrary
Giuliana Fiorentino, Universita' Roma Tre

The AVIP (Archivio delle Varietà di Italiano Parlato) corpus, a joint
project with the Laboratorio Linguistica of the Scuola Normale Superiore in
Pisa and the Linguistics Department of the Universita' di Napoli Federico
II has a collection of Italian map-task dialogues. 75 minutes of
spontaneous speech are phonetically segmented and labelled. 

There is a short description on LINGUIST List: 

http://linguistlist.org/issues/13/13-611.html#2

And a description of the project is available at:

http://www.cirass.unina.it/ricerca/studi%20parlato/raccolta%20corpora/raccoltacorpora.htm

This corpus is freely available via ftp: http://ftp.cirass.unina.it/avip/
with documentation under http://ftp.cirass.unina.it/avip/doc_app/
(mostly in Italian).

Another corpus, API (Archivio di Parlato Italiano), is available freely on
DVD by contacting Paola Petrone, CIRASS, Università di Napoli,
petrone at unina.it via email.  There is a query generator included on the DVD.

>From what I have understood, most of the phonetic transcriptions are done
in SAMPA or X-SAMPA, and those files with transcriptions are chopped into
speaker turns and so therefore need to be concatenated.

This information has been extremely helpful, but I am still looking for an
easy way to get phoneme frequencies for Standard Italian, so if anyone has
further information, please let me know!

Christina Villafana
Department of Linguistics
Georgetown University
Washington DC
cmv2 at georgetown.edu 

Linguistic Field(s): Phonetics; Text/Corpus Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-15-3192