<html>

<body>

Our apologies if you have received multiple copies of this

announcement<br>

 <br>

*******************************************************************<br>

ELRA - Language Resources Catalogue - Update<br>

*******************************************************************<br>

We are happy to announce the following Arabic resources, produced within

the NEMLAR project

(<a href="http://www.nemlar.org). /" eudora="autourl">www.nemlar.org).

</a> All 3 resources are owned and copyrighted by the Nemlar Consortium.

They are available in our catalogue.<br>

To view all the Language Resources available, you can visit our on-line

catalogue: <a href="http://www.elra.info/">http://www.elra.info</a> 

or <a href="http://www.elda.org/">http://www.elda.org</a> <br>

<b> <br>

*** ELRA-W0042 NEMLAR Written Corpus ***<br>

</b>This corpus consists of about 500,000 words of Arabic text from 13

different categories. The text is provided in 4 different versions:<br>

·        Raw text<br>

·        Fully vowelized text<br>

·        Text with Arabic lexical

analysis<br>

·        Text with Arabic

POS-tags<br>

 <br>

The database is distributed on 1 ISO 9660 CD-ROM volume.<br>

<b> <br>

</b>For more information, see

<a href="http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948">

http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948</a>

 <br>

 <br>

<b>*** ELRA-S0219 NEMLAR Broadcast News Speech Corpus ***<br>

</b>The data consists of about 40 hours and is provided by ELDA of Arabic

data (mainly Standard Arabic from a number of broadcast companies);

Transcriptions follow the Transcriber conventions as used by ELDA and

focus on the orthographic, named entities, speaker/turn segmentation

levels. No phonetic transcription/segmentation is planned.<br>

 <br>

The database is distributed in 1 ISO 9660 DVD-ROM volume.<br>

 <br>

For more information, see

<a href="http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948">

http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948</a>

<br>

 <br>

<b>*** ELRA-S0220 NEMLAR Speech Synthesis Corpus ***<br>

</b>The NEMLAR Speech Synthesis<b> </b>Corpus<b> </b>contains the

recordings of 2 native Egyptian speakers (male and female, 35 years old)

recorded in a studio over 2 channel (voice + laryngograph). The data

collection and transcription were performed by RDI (Egypt).<br>

 <br>

Speech samples are stored in 96 kHz, 24 bit with the least significant

byte first (“lohi” or Intel format) as (signed) integers.<br>

 <br>

The speaker read 2,032 prompted sentences covering approx. 42,000 words

in three categories: transcribed speech (20%), written text (50%), and

constructed phrases (30%).<br>

 <br>

The database is provided with orthographic, prosodic and phonetic

transcriptions in SAMPA.  All transcriptions were segmented at the

utterance (sentence/command word) level, annotated at the word level and

checked manually. A pronunciation lexicon including 3,589 headwords with

phonetics in SAMPA is also available.<br>

 <br>

The database is distributed on 3 ISO 9660 DVD-ROM volumes.<br>

 <br>

For more information, see

<a href="http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948">

http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948</a>

 <br>

 <br><br>

 For more information on the catalogue, please contact Valérie

Mapelli

<a href="mailto:mapelli@elda.org" eudora="autourl">

mailto:mapelli@elda.org</a> <br><br>

</body>

</html>