Corpora: LDC-ELDA: Joint Distribution of LR

Magali Duclaux duclaux at elda.fr
Wed Feb 27 16:39:16 UTC 2002


Cooperation Between ELDA and LDC - Distribution of Language Resources

Networking Data Centers, "Net-DC", (MLIS-5017), aims to improve the
infrastructure for language resources, by designing and implementing new
modes of cooperation between the Linguistic Data Consortium (LDC) and
the European Language Resources Distribution Agency (ELDA). In the
framework of this cooperation, LDC and ELDA are happy to announce the
following joint distribution of language resources.

Translanguage English Database (TED)
ELRA reference: http://www.elda.fr/cata/speech/S0031.html
LDC reference:  http://www.ldc.upenn.edu/Catalog/LDC2002S04.html

The Translanguage English Database (TED) is a corpus of recordings made
of oral presentations at Eurospeech'93 in Berlin. The corpus name
derives from the high percentage of oral presentations given in English
by non-native speakers of English. Two hundred twenty-four (224) oral
presentations at the conference were successfully recorded, providing a
total of about 75 hours of speech material. These recordings provide a
large number of presenters, speaking multiple variants of English, over
a relatively large amount of time (15 minutes for each presentation + 5
minutes of discussion), on a specific topic. This release of TED (6
CDROMs) includes 188 speeches, without the ensuing discussion periods.
This database was produced with the support of ELSNET. Associated text
materials consist of ASCII versions of over 400 proceedings papers and
oral preparations that were supplied by the authors, as well as, 250
speaker questionnaires.

Translanguage English Database (TED) Transcripts
ELRA reference: http://www.elda.fr/cata/speech/S0120.html
LDC reference: http://www.ldc.upenn.edu/Catalog/LDC2002T03.html

The Translanguage English Database (TED) Transcripts corpus contains
transcriptions of thirty-nine of the 188 speeches of the TED Corpus
(ELRA ref.: http://www.elda.fr/cata/speech/S0031.html; LDC
ref.: http://www.ldc.upenn.edu/Catalog/LDC2002S04.html) made at
Eurospeech'93 in Berlin. The thirty-nine transcripts in this publication
are in Universal Transcription Format (UTF) and were prepared by the
LDC. All utf files in the transcript publication were validated against
an included utf.dtd. Tables containing speaker demographic information
and a cross-reference of file names from the TED audio corpus are
included.


For further information, please contact ELRA/ELDA or LDC at:

ELRA/ELDA
55-57 rue Brillat-Savarin
F-75013 Paris, France
Tel: +33 01 43 13 33 33
Fax: +33 01 43 13 33 30
Email: mapelli at elda.fr
http://www.icp.grenet.fr/ELRA/home.html or http://www.elda.fr

LDC - Linguistic Data Consortium
3615 Market Street, Suite 200
PA 19104-2608 Philadelphia, USA
Tel: (215) 898-0464
Fax: (215) 573-2175
Email: ldc at ldc.upenn.edu
http://www.ldc.upenn.edu



More information about the Corpora mailing list