Corpora: ELRA News

Valerie Mapelli mapelli at elda.fr
Tue Apr 25 15:34:19 UTC 2000


[ We apologise for the duplicate posting of this announcement ]
___________________________________________________________
				ELRA
		European Language Resources Association
			       ELRA News 
___________________________________________________________

		     *** ELRA NEW RESOURCES ***

We are happy to announce new resources available via ELRA:

ELRA-S0083 ISLE Speech Corpus
ELRA-W0015 "Le Monde" Text Corpus - Year 1999

A description of each database is given below.

_______________________________________
ELRA-S0083 ISLE Speech Corpus
_______________________________________

This corpus contains approximately 20 minutes of speech 
(per speaker) from 23 German and 23 Italian intermediate 
learners of English. Each speaker recorded sentences from 
several blocks of various types (reading simple sentences, 
using minimal pairs, giving answers to multiple choice 
questions). The prompts were of varying perplexities.

About 2/3 of the data for each speaker was annotated by a 
team of linguists. The files were corrected first at the word  
level, and an automatic recogniser was then used to produce 
phone-level annotations. The annotator then re-annotated 
each sentence to mark phone and stress errors (e.g., 
substitutions, insertions, or deletions). 

Corpus details:
· a total of 46 speakers (23 German and 23 Italian)
· 11484 utterances
· 1.92 gigabytes of WAV files (4 CDs)
· 17 hours, 54 minutes, and 44 seconds of speech data
 
A much more detailed explanation of the ISLE corpus 
will be available in the proceedings of LREC 2000. An 
electronic copy of this paper may be obtained at ELRA 
(Reference: W. Menzel, E. Atwell, P. Bonaventura, D. Herron, 
P. Howarth, R. Morton, and C. Souter (in preparation). "The 
ISLE corpus of non-native spoken English", Proc. Second 
International Conference on Language Resources and Evaluation).

_______________________________________
ELRA-W0015 "Le Monde" Text corpus - Year 1999
_______________________________________

Electronic archiving of "Le Monde" articles started on 1 
January 1987. Some 200 articles are added every day, making 
it the biggest of its kind for all French daily newspapers.The 
corpus is available in an ASCII text format. Each month consists 
of some 10 MB of data (circa 120 MB per year). Data ranging 
from 1987 until 1999 are available through ELRA (each buyer 
may purchase up to 5 years of data).

=====================================
For further information, please contact:

     ELRA/ELDA	               Tel  +33 01 43 13 33 33
     55-57 rue Brillat-Savarin         Fax  +33 01 43 13 33 30
     F-75013 Paris, France           E-mail  mapelli at elda.fr

or visit the online catalogue on our Web site:

     http://www.icp.grenet.fr/ELRA/home.html
     or http://www.elda.fr
===================================== 



More information about the Corpora mailing list