18.3424, Software: ELRA Language Resources Catalogue 11/07-2

LINGUIST Network linguist at LINGUISTLIST.ORG
Sun Nov 18 00:12:13 UTC 2007


LINGUIST List: Vol-18-3424. Sat Nov 17 2007. ISSN: 1068 - 4875.

Subject: 18.3424, Software: ELRA Language Resources Catalogue 11/07-2

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hannah Morales <hannah at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 16-Nov-2007
From: Hélène Mazo < mazo at elda.org >
Subject: ELRA Language Resources Catalogue 11/07-2

 

	
-------------------------Message 1 ---------------------------------- 
Date: Sat, 17 Nov 2007 19:10:56
From: Hélène Mazo [mazo at elda.org]
Subject: ELRA Language Resources Catalogue 11/07-2
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=18-3424.html&submissionid=161407&topicid=13&msgnumber=1  


ELRA is happy to announce that 6 new Speech Resources from the TC-STAR
project are now available in its catalogue.

ELRA-S0249 TC-STAR English Training Corpora for ASR: Transcriptions of EPPS
Speech: 
This corpus consists of transcriptions from 92 hours of EPPS (European
Parliament Plenary Sessions) speeches held or interpreted in European
English (a mixture of native and non-native English). The transcription
files are stored in Transcriber XML file format. For corresponding
recordings, see ELRA-S0251. For more information, see:
http://catalog.elra.info/product_info.php?products_id=1032

ELRA-S0250 TC-STAR English-Spanish Training Corpora for Machine
Translation: Aligned Final Text Editions of EPPS: 
This corpus consists of respectively 34 million (English) and 38 million
(Spanish) running words of bilingual sentence segmented and aligned texts
in English and Spanish obtained from the Final Text Editions provided by
the European Parliament (from April 1996 to Sept. 2004, Dec. 2004 to May
2005, and Dec. 2005 to May 2006. The data is accompanied by tools for
further preprocessing. For more information, see:
http://catalog.elra.info/product_info.php?products_id=1033

ELRA-S0251 TC-STAR English Training Corpora for ASR: Recordings of EPPS
Speech: 
This corpus consists of the recordings of around 290 hours form EPPS
(European Parliament Plenary Sessions) speeches held or interpreted in
European English, 92 hours of which were annotated (transcribed) (the
transcriptions are not provided in the present package). Each file contains
a single channel with 16-bit resolution at a sample rate of 16kHz. For
corresponding transcriptions, see ELRA-S0249. For more information, see:
http://catalog.elra.info/product_info.php?products_id=1034

ELRA-S0252 TC-STAR Spanish Training Corpora for ASR: Recordings of EPPS
Speech:  
This corpus consists of the recordings of around 283 hours from EPPS
(European Parliament Plenary Sessions) speeches held or interpreted in
European Spanish (a mixture of native and non-native Spanish). Each file
contains a single channel with 16-bit resolution at a sample rate of 16kHz.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1036

ELRA-S0253 TC-STAR English Test Corpora for ASR: 
This corpus consists of 70 hours of recordings of EPPS (European Parliament
Plenary Sessions) speeches held or interpreted in European English and
other European languages. From this corpus, 16 hours of English speeches
(native or non native) were annotated (transcribed). Each speech file
contains a single channel with 16-bit resolution at a sample rate of 16kHz.
The transcription files are stored in Transcriber XML file format. For more
information, see:
http://catalog.elra.info/product_info.php?products_id=1037

ELRA-S0254 TC-STAR Spanish Test Corpora for ASR:
This corpus consists of 174 hours of recordings of EPPS (European
Parliament Plenary Sessions) speeches held or interpreted in European
Spanish and other European languages. From this corpus, 16 hours of Spanish
speeches were annotated (transcribed). Each audio file contains a single
channel with 16-bit resolution at a sample rate of 16kHz. The transcription
files are stored in Transcriber XML file format. For more information, see:
http://catalog.elra.info/product_info.php?products_id=1038

For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at elda.org

Visit our on-line catalogue: http://catalog.elra.info. 
Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics





-----------------------------------------------------------
LINGUIST List: Vol-18-3424	

	



More information about the LINGUIST mailing list