[Corpora-List] ELRA News

Magali Jeanmaire duclaux at elda.fr
Tue Jun 8 14:50:20 UTC 2004


**********************************************************
ELRA - Language Resources Catalogue - Update
*********************************************************
We are happy to announce that new Language Resources are
now available in our catalogue:

Short descriptions of these resources are given below.
More detailed descriptions are available on our web sites,
at www.elda.fr or www.elra.info.
-------------------------------------------
Written Language Resources
-------------------------------------------
*** W0015 Le Monde Text Corpus - Update ***
Electronic archiving of "Le Monde" articles started on 1 January 1987.
The entire corpus is available in an ASCII text format.
Year 2003 is available in .XML format.

*** W0036/04 Le Monde Diplomatique Text corpus in Arabic ***
Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 1998.
The corpus is available in an ASCII text format.
French and English versions also available.

-------------------------------------------
Spoken Language Resources
-------------------------------------------
*** S0158 Turkish OrienTel database ***
This speech database contains the recordings of 1,700 Turkish speakers
recorded over the Turkish fixed and mobile telephone network.
Each speaker uttered around 45 read and spontaneous items.

*** S0159 German spoken by Turkish OrienTel database ***
This speech database contains the recordings of 332 Turkish speakers
of German recorded over the German fixed and mobile telephone network.
Each speaker uttered around 53 read and spontaneous items.

*** S0160 Spanish Speecon database ***
The Spanish Speecon database comprises the recordings of 561 adult
Spanish speakers and 55 child Spanish speakers who uttered respectively
over 290 items and 210 items (read and spontaneous).

*** S0161 Russian Speecon database ***
The Russian Speecon database comprises the recordings of 550 adult
Russian speakers and 50 child Russian speakers who uttered respectively
over 290 items and 210 items (read and spontaneous).

*** S0162 Hempel ***
This corpus contains 25.5 hours of recordings by 3,909 German speakers
with a total of 184,240 spoken words, made via public phone lines (fixed
network only). The contents are free monologues answering the question:
"Was haben Sie in der letzten Stunde gemacht?" (What did you do within
the last hour?). The database is conformant with the SpeechDat Exchange
Format.



More information about the Corpora mailing list