Ressources: ELRA - Language Resources Catalogue - Update
Thierry Hamon
thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Nov 16 16:23:15 UTC 2007
Date: Fri, 16 Nov 2007 15:12:28 +0100
From: ELDA <info at elda.org>
Message-ID: <473DA54C.5090209 at elda.org>
X-url: http://catalog.elra.info/product_info.php?products_id=1028
X-url: http://catalog.elra.info/product_info.php?products_id=1029
X-url: http://catalog.elra.info/product_info.php?products_id=1030
X-url: http://catalog.elra.info/product_info.php?products_id=1031
X-url: http://catalog.elra.info/product_info.php?products_id=1032
X-url: http://catalog.elra.info/product_info.php?products_id=1033
X-url: http://catalog.elra.info/product_info.php?products_id=1034
X-url: http://catalog.elra.info/product_info.php?products_id=1036
X-url: http://catalog.elra.info/product_info.php?products_id=1037
X-url: http://catalog.elra.info/product_info.php?products_id=1038
X-url: http://catalog.elra.info
X-url: http://catalog.elra.info/
Our apologies if you have received multiple copies of this announcement.
*******************************************************************
ELRA - Language Resources Catalogue - Update
*******************************************************************
ELRA is happy to announce that 10 new Speech Resources from both LC-STAR
and TC-STAR projects are now available in its catalogue.
*ELRA-S0245 LC-STAR German Phonetic lexicon
*The LC-STAR German Phonetic lexicon comprises 102,169 entries,
including a set of 55,507 common words, a set of 46,662 proper names
(including person names, family names, cities, streets, companies and
brand names) and a list of 6,763 special application words. The lexicon
is provided in XML format and includes phonetic transcriptions in
*SAMPA. ).
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1028
*ELRA-S0246 LC-STAR German Phonetic lexicon in the Touristic Domain
*The LC-STAR German Phonetic lexicon in the Touristic Domain comprises
8,782 entries from the following categories: nouns, adjectives and
verbs. For each entry the following information is provided:
orthographic form, part-of-speech (POS), phonemic transcription. The
lexicon is provided in XML format and includes phonetic transcriptions
in SAMPA.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1029
*ELRA-S0247 LC-STAR Standard Arabic Phonetic lexicon
*The LC-STAR Standard Arabic Phonetic lexicon comprises 110,271 entries,
including a set of 52,981 common words, a set of 50,135 proper names
(including person names, family names, cities, streets, companies and
brand names) and a list of 7,155 special application words. The lexicon
is provided in XML format and includes phonetic transcriptions in SAMPA.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1030
*ELRA-S0248 LC-STAR English-German Bilingual Aligned Phrasal lexicon
*The LC-STAR English-German Bilingual Aligned Phrasal lexicon comprises
10,733 phrases from the tourist domain. It is based on a list of short
sentences obtained by translation from US-English 10,518 phrasal corpus.
The lexicon is provided in XML format.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1031
*ELRA-S0249 TC-STAR English Training Corpora for ASR: Transcriptions of
EPPS Speech
*This corpus consists of transcriptions from 92 hours of EPPS (European
Parliament Plenary Sessions) speeches held or interpreted in European
English (a mixture of native and non-native English). The transcription
files are stored in Transcriber XML file format.
For corresponding recordings, see ELRA-S0251
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1032
*ELRA-S0250 TC-STAR English-Spanish Training Corpora for Machine
Translation: Aligned Final Text Editions of EPPS
*This corpus consists of respectively 34 million (English) and 38
million (Spanish) running words of bilingual sentence segmented and
aligned texts in English and Spanish obtained from the Final Text
Editions provided by the European Parliament (from April 1996 to Sept.
2004, Dec. 2004 to May 2005, and Dec. 2005 to May 2006. The data is
accompanied by tools for further preprocessing.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1033
*ELRA-S0251 TC-STAR English Training Corpora for ASR: Recordings of EPPS
Speech
*This corpus consists of the recordings of around 290 hours form EPPS
(European Parliament Plenary Sessions) speeches held or interpreted in
European English, 92 hours of which were annotated (transcribed) (the
transcriptions are not provided in the present package). Each file
contains a single channel with 16-bit resolution at a sample rate of 16kHz.
For corresponding transcriptions, see ELRA-S0249.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1034
*ELRA-S0252 TC-STAR Spanish Training Corpora for ASR: Recordings of EPPS
Speech
*This corpus consists of the recordings of around 283 hours from EPPS
(European Parliament Plenary Sessions) speeches held or interpreted in
European Spanish (a mixture of native and non-native Spanish). Each file
contains a single channel with 16-bit resolution at a sample rate of 16kHz.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1036
*ELRA-S0253* *TC-STAR English Test Corpora for ASR
*This corpus consists of 70 hours of recordings of EPPS (European
Parliament Plenary Sessions) speeches held or interpreted in European
English and other European languages. From this corpus, 16 hours of
English speeches (native or non native) were annotated (transcribed).
Each speech file contains a single channel with 16-bit resolution at a
sample rate of 16kHz. The transcription files are stored in Transcriber
XML file format.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1037
*ELRA-S0254* *TC-STAR Spanish Test Corpora for ASR
*This corpus consists of 174 hours of recordings of EPPS (European
Parliament Plenary Sessions) speeches held or interpreted in European
Spanish and other European languages. From this corpus, 16 hours of
Spanish speeches were annotated (transcribed). Each audio file contains
a single channel with 16-bit resolution at a sample rate of 16kHz. The
transcription files are stored in Transcriber XML file format.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1038
For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at elda.org
Visit our on-line catalogue: http://catalog.elra.info .
-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version :
Archives : http://listserv.linguistlist.org/archives/ln.html
http://liste.cines.fr/info/ln
La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion : http://www.atala.org/
-------------------------------------------------------------------------
More information about the Ln
mailing list