Ressources: ELRA - Language Resources Catalogue - Update
Thierry Hamon
thierry.hamon at UNIV-PARIS13.FR
Fri Sep 14 20:04:20 UTC 2012
Date: Thu, 13 Sep 2012 17:43:21 +0200
From: ELRA ELDA Information <info at elda.org>
Message-ID: <5051FF19.5010406 at elda.org>
X-url: http://catalog.elra.info/product_info.php?products_id=1172
X-url: http://catalog.elra.info/product_info.php?products_id=1173
X-url: http://catalog.elra.info/product_info.php?products_id=1174
X-url: http://catalog.elra.info/product_info.php?products_id=1176
Our apologies if you have received multiple copies of this announcement.
*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************
ELRA is happy to announce that 2 new Speech Desktop/Microphone Resources
and 2 new Written Corpora are now available in its catalogue.
* ELRA-S0345 Spoken Portuguese Corpus
* The Spoken Portuguese corpus consists of a total of 86 recordings
(8h44m), collected among sociolinguistically diverse speakers having
Portuguese as mother tongue or as second language. The corpus was
recorded in a situation of spontaneous oral communication, on different
themes of everyday life, with speakers of different ages and social and
professional backgrounds. The corpus consists of audio files in .wav
format, aligned transcriptions in XML Exmaralda format and
transcriptions in plain text.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1172
* ELRA-S0346 Fundamental Portuguese Corpus
* The Fundamental Portuguese Corpus is a corpus of spoken language,
collected between 1970 and 1974, composed of 1800 recordings (500 hours)
made in Continental Portugal and the Islands. Of these 1800
conversations, a sample was selected and transcribed. The corpus
consists of audio files in .wav format, aligned transcriptions in XML
Exmaralda format and transcriptions in plain text.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1173
* ELRA-W0055 CINTIL-TreeBank
* The CINTIL-TreeBank is a corpus of syntactic constituency trees of
Portuguese texts composed of 10,039 sentences and 110,166 tokens taken
from different sources and domains: news (8,861 sentences; 101,430
tokens), novels (399 sentences; 3,082 tokens). In addition, there are
779 sentences (5,654 tokens) that are used for regression testing of the
computational grammar that supported the annotation of the corpus.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1174
* ELRA-W0056 CINTIL-PropBank
* The CINTIL-PropBank is a corpus of sentences annotated with their
constituency structure and semantic role tags, composed of 10,039
sentences and 110,166 tokens taken from different sources and domains:
news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082
tokens). In addition, there are 779 sentences (5,654 tokens) used for
regression testing of the computational grammar that supported the
annotation of the corpus.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1176
For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at elda.org
Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates:
http://www.elra.info/LRs-Announcements.html
-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version :
Archives : http://listserv.linguistlist.org/archives/ln.html
http://liste.cines.fr/info/ln
La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion : http://www.atala.org/
-------------------------------------------------------------------------
More information about the Ln
mailing list