Corpora: ELRA News

Valerie Mapelli mapelli at elda.fr
Wed Jan 19 15:37:36 UTC 2000


[ We apologise for the duplicate posting of this announcement ]

___________________________________________________________
				ELRA
		European Language Resources Association
			       ELRA News
___________________________________________________________


		     *** ELRA NEW RESOURCES ***


We are happy to announce a new resource available via ELRA:
	_______________________________________
	ELRA-S0076 French SpeechDat(II) FDB 5000
	_______________________________________

	The French SpeechDat(II) FDB-5000 comprises 5040
	French speakers recorded over the French fixed telephone
	network. 40 speakers have been added to the original 5,000
	speakers to fit the requirements of the database. This
	database is partitioned into 18 CDs, each of which comprises
	300 speakers sessions (except for CD 4, with 100 speakers
	sessions). The speech databases made within the SpeechDat(II)
	project were validated by SPEX, the Netherlands, to assess
	their compliance with the SpeechDat format and content
	specifications.

	The speech files are stored as sequence of 8-bit, 8kHz A-law speech files
	and are not compressed. Each prompt utterance is stored within a separate
	file and has an accompanying ASCII SAM label file.

	The following items were recorded:
	- 5 application words;
	- 1 sequence of 10 isolated digits;
	- 4 connected digits: 1 sheet number (5+ digits), 1 telephone number
	(9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits);
	- 3 dates: 1 spontaneous date (e.g. birthday), 1 prompted date (word
	style), 1 relative and general date expression;
	- 2 word spotting phrases using an application word (embedded);
	- 1 isolated digit;
	- 3 spelled-out words (letter sequences): 1 spontaneous, e.g. own
	forename; 1 spelling of directory assistance city name; 1 real/artificial
	name for coverage;
	- 1 currency money amount;
	- 1 natural number;
	- 5 directory assistance names + 1 spelled-out name: 1 spontaneous,
	e.g. own forename, 1 city of birth / hometown (spontaneous); 1 most
	frequent city (out of 500); 1 most frequent company/agency (out of 500);
	1 “forename surname”, 1 spelled-out city of birth;
	- 2 questions, including "fuzzy" yes/no: 1 predominantly "yes" question,
	1 predominantly "no" question;
	- 9 phonetically rich sentences;
	- 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style);
	- 8 phonetically rich words.

	The following age distribution has been obtained: 215 speakers are below
	16 years old, 2531 speakers are between 16 and 30, 1208 speakers are
	between  31 and 45, 910 speakers are between 46 and 60, and 176 speakers
	are over 60.

	A pronunciation lexicon with a phonemic transcription in SAMPA is also
included.

=====================================
For further information, please contact :

     ELRA/ELDA		Tel : +33 01 43 13 33 33
     55-57 rue Brillat-Savarin	Fax : +33 01 43 13 33 30
     F-75013 Paris, France	E-mail : mapelli at elda.fr

or visit our Web site:

     http://www.icp.grenet.fr/ELRA/home.html
=====================================



More information about the Corpora mailing list