[Corpora-List] ELRA - Language Resources Catalogue - Update

Thu Sep 13 15:43:21 UTC 2012

Our apologies if you have received multiple copies of this announcement.

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 2 new Speech Desktop/Microphone Resources 
and 2 new Written Corpora are now available in its catalogue.
*
ELRA-S0345 Spoken Portuguese Corpus
*The Spoken Portuguese corpus consists of a total of 86 recordings 
(8h44m), collected among sociolinguistically diverse speakers having 
Portuguese as mother tongue or as second language. The corpus was 
recorded in a situation of spontaneous oral communication, on different 
themes of everyday life, with speakers of different ages and social and 
professional backgrounds. The corpus consists of audio files in .wav 
format, aligned transcriptions in XML Exmaralda format and 
transcriptions in plain text.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1172

*ELRA-S0346 Fundamental Portuguese Corpus
*The Fundamental Portuguese Corpus is a corpus of spoken language, 
collected between 1970 and 1974, composed of 1800 recordings (500 hours) 
made in Continental Portugal and the Islands. Of these 1800 
conversations, a sample was selected and transcribed. The corpus 
consists of audio files in .wav format, aligned transcriptions in XML 
Exmaralda format and transcriptions in plain text.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1173

*ELRA-W0055 CINTIL-TreeBank
*The CINTIL-TreeBank is a corpus of syntactic constituency trees of 
Portuguese texts composed of 10,039 sentences and 110,166 tokens taken 
from different sources and domains: news (8,861 sentences; 101,430 
tokens), novels (399 sentences; 3,082 tokens). In addition, there are 
779 sentences (5,654 tokens) that are used for regression testing of the 
computational grammar that supported the annotation of the corpus.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1174
*
ELRA-W0056 CINTIL-PropBank
*The CINTIL-PropBank is a corpus of sentences annotated with their 
constituency structure and semantic role tags, composed of 10,039 
sentences and 110,166 tokens taken from different sources and domains: 
news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 
tokens). In addition, there are 779 sentences (5,654 tokens) used for 
regression testing of the computational grammar that supported the 
annotation of the corpus.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1176

For more information on the catalogue, please contact Valérie Mapelli 
mailto:mapelli at elda.org

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: 
http://www.elra.info/LRs-Announcements.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120913/0c8136c6/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora