[Corpora-List] ELRA - Language Resources Catalogue - Update

ELRA ELDA Information info at elda.org
Wed Dec 12 16:22:17 UTC 2012


Our apologies if you have received multiple copies of this announcement.

*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************

ELRA is happy to announce that 4 new Written Corpora are now available 
in its catalogue.
*
ELRA-W0059 LT Corpus
*The LT Corpus is composed of 70 fiction texts from Portuguese renowned 
authors. The corpus contains 1,781,083 tokens. The texts date from 
before 1940. The corpus is delivered in one file, in two different 
formats. The txt version has one sentence per line, an identification 
number for each text and no further annotation. The cqpweb file is one 
token per line, followed by pos tag and lemma, and is annotated for NP 
chunks.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1178

*ELRA-W0060 PTPARL Corpus
*The PTPARL Corpus contains 1,076 texts consisting of adapted 
transcriptions of the Portuguese Parliament sessions. The corpus 
contains 1,000,441 tokens. The corpus is delivered in one file, in two 
different formats. The txt version has one sentence per line, an 
identification number for each text and no further annotation. The 
cqpweb file is one token per line, followed by pos tag and lemma, and is 
annotated for NP chunks.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1179

*ELRA-W0061 CINTIL-DependencyBank
*The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of 
sentences annotated with their syntactic dependency graphs and 
grammatical function tags composed of 10,039 sentences and 110,166 
tokens taken from different sources and domains: news (8,861 sentences; 
101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, 
there are 779 sentences (5,654 tokens) that are used for regression 
testing of the computational grammar that supported the annotation of 
the corpus.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1180

*ELRA-W0062 CINTIL-DeepBank
*The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences 
annotated with their full-fledged deep grammatical representations, 
composed of 10,039 sentences and 110,166 tokens taken from different 
sources and domains: news (8,861 sentences; 101,430 tokens), and novels 
(399 sentences; 3,082 tokens). In addition, there are 779 sentences 
(5,654 tokens) used for regression testing of the computational grammar 
that supported the annotation of the corpus.
For more information, see: 
http://catalog.elra.info/product_info.php?products_id=1181


For more information on the catalogue, please contact Valérie Mapelli 
mailto:mapelli at elda.org

Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates: 
http://www.elra.info/LRs-Announcements.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121212/b9e0600c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list