[Corpora-List] ELRA - Language Resources Catalogue - Update

Wed Sep 21 13:23:11 UTC 2011

Our apologies if you have received multiple copies of this announcement.

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 4 new Speech Resources from the 
GlobalPhone corpus are now available in its catalogue.
Moreover, an updated version of the Venice Italian Treebank (VIT) has 
also been released.
*
1) New Language Resources:

The GlobalPhone Corpus: *The GlobalPhone corpus was designed to provide 
read speech data for the development and evaluation of large continuous 
speech recognition systems in the most widespread languages of the 
world, and to provide a uniform, multilingual speech and text database 
for language independent and language adaptive speech recognition as 
well as for language identification tasks. The entire GlobalPhone corpus 
enables the acquisition of acoustic-phonetic knowledge of the following 
19 spoken languages Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), 
Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian 
(ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German 
(ELRA-S0198), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish 
(ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), 
Spanish (Latin America) (ELRA-S0203), Swedish (ELRA-S0204), Tamil 
(ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Vietnamese 
(ELRA-S0322). In each language about 100 sentences were read from each 
of the 100 speakers. The read texts were selected from national 
newspapers available via Internet to provide a large vocabulary (up to 
65,000 words). The read articles cover national and international 
political news as well as economic news.

Special prices are offered for a combined purchase of several 
GlobalPhone languages (5 languages, 10 languages, 15 languages or 19 
languages).*

*New 4 languages are available from the GlobalPhone corpus*:
**ELRA-S0319 GlobalPhone Bulgarian*
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1141
*ELRA-S0320**GlobalPhone Polish*
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1142
*ELRA-S0321 **GlobalPhone Thai*
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1143
*ELRA-S0322 **GlobalPhone Vietnamese*
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1144

*2) Update of **ELRA-W0040 Venice Italian Treebank (VIT)**:*
The new version of VIT has a totally revised constituent-based 
representation and a completely new dependency-based representation 
which has been achieved by semi-automatic procedures.*

*The VIT, Venice Italian Treebank contains about 272,000 words 
distributed over six different domains: bureaucratic, political, 
economic and financial, literary, scientific, and news. In addition, 
some 60,000 tokens of spoken dialogues in different Italian varieties 
were annotated.
The annotation follows general X-bar criteria with 29 constituency 
labels and 102 PoS tags. VIT is also made available in a broad 
annotation version with 10 constituency labels and 22 PoS tags for 
machine learning purposes. The format is plain text with square 
bracketing. However, a UPenn style version which is readable by the open 
source query language CorpusSearch is also provided.
*
*For more information, see: 
http://catalog.elra.info/product_info.php?products_id=831

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110921/1b11621d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora