<html>
<body>
Our apologies if you have received multiple copies of this
announcement<br>
<br>
*******************************************************************<br>
ELRA - Language Resources Catalogue - Update<br>
*******************************************************************<br>
We are happy to announce the following Arabic resources, produced within
the NEMLAR project
(<a href="http://www.nemlar.org). /" eudora="autourl">www.nemlar.org).
</a> All 3 resources are owned and copyrighted by the Nemlar Consortium.
They are available in our catalogue.<br>
To view all the Language Resources available, you can visit our on-line
catalogue: <a href="http://www.elra.info/">http://www.elra.info</a>
or <a href="http://www.elda.org/">http://www.elda.org</a> <br>
<b> <br>
*** ELRA-W0042 NEMLAR Written Corpus ***<br>
</b>This corpus consists of about 500,000 words of Arabic text from 13
different categories. The text is provided in 4 different versions:<br>
· Raw text<br>
· Fully vowelized text<br>
· Text with Arabic lexical
analysis<br>
· Text with Arabic
POS-tags<br>
<br>
The database is distributed on 1 ISO 9660 CD-ROM volume.<br>
<b> <br>
</b>For more information, see
<a href="http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948">
http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948</a>
<br>
<br>
<b>*** ELRA-S0219 NEMLAR Broadcast News Speech Corpus ***<br>
</b>The data consists of about 40 hours and is provided by ELDA of Arabic
data (mainly Standard Arabic from a number of broadcast companies);
Transcriptions follow the Transcriber conventions as used by ELDA and
focus on the orthographic, named entities, speaker/turn segmentation
levels. No phonetic transcription/segmentation is planned.<br>
<br>
The database is distributed in 1 ISO 9660 DVD-ROM volume.<br>
<br>
For more information, see
<a href="http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948">
http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948</a>
<br>
<br>
<b>*** ELRA-S0220 NEMLAR Speech Synthesis Corpus ***<br>
</b>The NEMLAR Speech Synthesis<b> </b>Corpus<b> </b>contains the
recordings of 2 native Egyptian speakers (male and female, 35 years old)
recorded in a studio over 2 channel (voice + laryngograph). The data
collection and transcription were performed by RDI (Egypt).<br>
<br>
Speech samples are stored in 96 kHz, 24 bit with the least significant
byte first (“lohi” or Intel format) as (signed) integers.<br>
<br>
The speaker read 2,032 prompted sentences covering approx. 42,000 words
in three categories: transcribed speech (20%), written text (50%), and
constructed phrases (30%).<br>
<br>
The database is provided with orthographic, prosodic and phonetic
transcriptions in SAMPA. All transcriptions were segmented at the
utterance (sentence/command word) level, annotated at the word level and
checked manually. A pronunciation lexicon including 3,589 headwords with
phonetics in SAMPA is also available.<br>
<br>
The database is distributed on 3 ISO 9660 DVD-ROM volumes.<br>
<br>
For more information, see
<a href="http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948">
http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948</a>
<br>
<br><br>
For more information on the catalogue, please contact Valérie
Mapelli
<a href="mailto:mapelli@elda.org" eudora="autourl">
mailto:mapelli@elda.org</a> <br><br>
</body>
</html>