Corpora: ELRA News
Valerie Mapelli
mapelli at elda.fr
Tue Apr 25 15:34:19 UTC 2000
[ We apologise for the duplicate posting of this announcement ]
___________________________________________________________
ELRA
European Language Resources Association
ELRA News
___________________________________________________________
*** ELRA NEW RESOURCES ***
We are happy to announce new resources available via ELRA:
ELRA-S0083 ISLE Speech Corpus
ELRA-W0015 "Le Monde" Text Corpus - Year 1999
A description of each database is given below.
_______________________________________
ELRA-S0083 ISLE Speech Corpus
_______________________________________
This corpus contains approximately 20 minutes of speech
(per speaker) from 23 German and 23 Italian intermediate
learners of English. Each speaker recorded sentences from
several blocks of various types (reading simple sentences,
using minimal pairs, giving answers to multiple choice
questions). The prompts were of varying perplexities.
About 2/3 of the data for each speaker was annotated by a
team of linguists. The files were corrected first at the word
level, and an automatic recogniser was then used to produce
phone-level annotations. The annotator then re-annotated
each sentence to mark phone and stress errors (e.g.,
substitutions, insertions, or deletions).
Corpus details:
· a total of 46 speakers (23 German and 23 Italian)
· 11484 utterances
· 1.92 gigabytes of WAV files (4 CDs)
· 17 hours, 54 minutes, and 44 seconds of speech data
A much more detailed explanation of the ISLE corpus
will be available in the proceedings of LREC 2000. An
electronic copy of this paper may be obtained at ELRA
(Reference: W. Menzel, E. Atwell, P. Bonaventura, D. Herron,
P. Howarth, R. Morton, and C. Souter (in preparation). "The
ISLE corpus of non-native spoken English", Proc. Second
International Conference on Language Resources and Evaluation).
_______________________________________
ELRA-W0015 "Le Monde" Text corpus - Year 1999
_______________________________________
Electronic archiving of "Le Monde" articles started on 1
January 1987. Some 200 articles are added every day, making
it the biggest of its kind for all French daily newspapers.The
corpus is available in an ASCII text format. Each month consists
of some 10 MB of data (circa 120 MB per year). Data ranging
from 1987 until 1999 are available through ELRA (each buyer
may purchase up to 5 years of data).
=====================================
For further information, please contact:
ELRA/ELDA Tel +33 01 43 13 33 33
55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30
F-75013 Paris, France E-mail mapelli at elda.fr
or visit the online catalogue on our Web site:
http://www.icp.grenet.fr/ELRA/home.html
or http://www.elda.fr
=====================================
More information about the Corpora
mailing list