Ressources: ELRA - Language Resources Catalogue - Update

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Fri Sep 14 20:04:20 UTC 2012


Date: Thu, 13 Sep 2012 17:43:21 +0200
From: ELRA ELDA Information <info at elda.org>
Message-ID: <5051FF19.5010406 at elda.org>
X-url: http://catalog.elra.info/product_info.php?products_id=1172
X-url: http://catalog.elra.info/product_info.php?products_id=1173
X-url: http://catalog.elra.info/product_info.php?products_id=1174
X-url: http://catalog.elra.info/product_info.php?products_id=1176


Our apologies if you have received multiple copies of this announcement.

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 2 new Speech Desktop/Microphone Resources
and 2 new Written Corpora are now available in its catalogue.

* ELRA-S0345 Spoken Portuguese Corpus
* The Spoken Portuguese corpus consists of a total of 86 recordings
(8h44m), collected among sociolinguistically diverse speakers having
Portuguese as mother tongue or as second language. The corpus was
recorded in a situation of spontaneous oral communication, on different
themes of everyday life, with speakers of different ages and social and
professional backgrounds. The corpus consists of audio files in .wav
format, aligned transcriptions in XML Exmaralda format and
transcriptions in plain text.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1172

* ELRA-S0346 Fundamental Portuguese Corpus
* The Fundamental Portuguese Corpus is a corpus of spoken language,
collected between 1970 and 1974, composed of 1800 recordings (500 hours)
made in Continental Portugal and the Islands. Of these 1800
conversations, a sample was selected and transcribed. The corpus
consists of audio files in .wav format, aligned transcriptions in XML
Exmaralda format and transcriptions in plain text.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1173

* ELRA-W0055 CINTIL-TreeBank
* The CINTIL-TreeBank is a corpus of syntactic constituency trees of
Portuguese texts composed of 10,039 sentences and 110,166 tokens taken
from different sources and domains: news (8,861 sentences; 101,430
tokens), novels (399 sentences; 3,082 tokens). In addition, there are
779 sentences (5,654 tokens) that are used for regression testing of the
computational grammar that supported the annotation of the corpus.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1174

* ELRA-W0056 CINTIL-PropBank
* The CINTIL-PropBank is a corpus of sentences annotated with their 
constituency structure and semantic role tags, composed of 10,039 
sentences and 110,166 tokens taken from different sources and domains: 
news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 
tokens). In addition, there are 779 sentences (5,654 tokens) used for 
regression testing of the computational grammar that supported the 
annotation of the corpus.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1176

For more information on the catalogue, please contact Valérie Mapelli 
mailto:mapelli at elda.org

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: 
http://www.elra.info/LRs-Announcements.html

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list