17.2368, Software: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Tue Aug 22 14:30:08 UTC 2006


LINGUIST List: Vol-17-2368. Tue Aug 22 2006. ISSN: 1068 - 4875.

Subject: 17.2368, Software: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Laura Welcher, Rosetta Project / Long Now Foundation  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Svetlana Aksenova <svetlana at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================  

1)
Date: 22-Aug-2006
From: Helene Mazo < mazo at elda.org >
Subject: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06 

	
-------------------------Message 1 ---------------------------------- 
Date: Tue, 22 Aug 2006 10:27:56
From: Helene Mazo < mazo at elda.org >
Subject: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06 
 


ELRA - Language Resources Catalogue - Update

We are happy to announce the following Arabic resources, produced within
the NEMLAR project (www.nemlar.org). All 3 resources are owned and
copyrighted by the Nemlar Consortium. They are available in our catalogue.
To view all the Language Resources available, you can visit our on-line
catalogue: http://www.elra.info  or http://www.elda.org
 

ELRA-W0042 NEMLAR Written Corpus

This corpus consists of about 500,000 words of Arabic text from 13
different categories. The text is provided in 4 different versions:
- Raw text
- Fully vowelized text
- Text with Arabic lexical analysis
- Text with Arabic POS-tags
 
The database is distributed on 1 ISO 9660 CD-ROM volume.
 
For more information, see
http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948
 

ELRA-S0219 NEMLAR Broadcast News Speech Corpus

The data consists of about 40 hours and is provided by ELDA of Arabic data
(mainly Standard Arabic from a number of broadcast companies);
Transcriptions follow the Transcriber conventions as used by ELDA and focus
on the orthographic, named entities, speaker/turn segmentation levels. No
phonetic transcription/segmentation is planned.
 
The database is distributed in 1 ISO 9660 DVD-ROM volume.
 
For more information, see
http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948
 

ELRA-S0220 NEMLAR Speech Synthesis Corpus

The NEMLAR Speech Synthesis Corpus contains the recordings of 2 native
Egyptian speakers (male and female, 35 years old) recorded in a studio over
2 channel (voice + laryngograph). The data collection and transcription
were performed by RDI (Egypt).
 
Speech samples are stored in 96 kHz, 24 bit with the least significant byte
first ("lohi" or Intel format) as (signed) integers.
 
The speaker read 2,032 prompted sentences covering approx. 42,000 words in
three categories: transcribed speech (20%), written text (50%), and
constructed phrases (30%).
 
The database is provided with orthographic, prosodic and phonetic
transcriptions in SAMPA.  All transcriptions were segmented at the
utterance (sentence/command word) level, annotated at the word level and
checked manually. A pronunciation lexicon including 3,589 headwords with
phonetics in SAMPA is also available.
 
The database is distributed on 3 ISO 9660 DVD-ROM volumes.
 
For more information, see
http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948
 

For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at elda.org 
Linguistic Field(s): Computational Linguistics
                     Lexicography
                     Phonetics
                     Text/Corpus Linguistics




-----------------------------------------------------------
LINGUIST List: Vol-17-2368	

	



More information about the LINGUIST mailing list