[Corpora-List] ELRA - Language Resources Catalogue - Update
Info
info at elda.org
Wed Sep 21 13:23:11 UTC 2011
Our apologies if you have received multiple copies of this announcement.
*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************
ELRA is happy to announce that 4 new Speech Resources from the
GlobalPhone corpus are now available in its catalogue.
Moreover, an updated version of the Venice Italian Treebank (VIT) has
also been released.
*
1) New Language Resources:
The GlobalPhone Corpus: *The GlobalPhone corpus was designed to provide
read speech data for the development and evaluation of large continuous
speech recognition systems in the most widespread languages of the
world, and to provide a uniform, multilingual speech and text database
for language independent and language adaptive speech recognition as
well as for language identification tasks. The entire GlobalPhone corpus
enables the acquisition of acoustic-phonetic knowledge of the following
19 spoken languages Arabic (ELRA-S0192), Bulgarian (ELRA-S0319),
Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian
(ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German
(ELRA-S0198), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish
(ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202),
Spanish (Latin America) (ELRA-S0203), Swedish (ELRA-S0204), Tamil
(ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Vietnamese
(ELRA-S0322). In each language about 100 sentences were read from each
of the 100 speakers. The read texts were selected from national
newspapers available via Internet to provide a large vocabulary (up to
65,000 words). The read articles cover national and international
political news as well as economic news.
Special prices are offered for a combined purchase of several
GlobalPhone languages (5 languages, 10 languages, 15 languages or 19
languages).*
*New 4 languages are available from the GlobalPhone corpus*:
**ELRA-S0319 GlobalPhone Bulgarian*
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1141
*ELRA-S0320**GlobalPhone Polish*
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1142
*ELRA-S0321 **GlobalPhone Thai*
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1143
*ELRA-S0322 **GlobalPhone Vietnamese*
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1144
*2) Update of **ELRA-W0040 Venice Italian Treebank (VIT)**:*
The new version of VIT has a totally revised constituent-based
representation and a completely new dependency-based representation
which has been achieved by semi-automatic procedures.*
*The VIT, Venice Italian Treebank contains about 272,000 words
distributed over six different domains: bureaucratic, political,
economic and financial, literary, scientific, and news. In addition,
some 60,000 tokens of spoken dialogues in different Italian varieties
were annotated.
The annotation follows general X-bar criteria with 29 constituency
labels and 102 PoS tags. VIT is also made available in a broad
annotation version with 10 constituency labels and 22 PoS tags for
machine learning purposes. The format is plain text with square
bracketing. However, a UPenn style version which is readable by the open
source query language CorpusSearch is also provided.
*
*For more information, see:
http://catalog.elra.info/product_info.php?products_id=831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110921/1b11621d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list