[Corpora-List] ELRA - Language Resources Catalogue - Update
ELRA ELDA Information
info at elda.org
Wed Dec 12 16:22:17 UTC 2012
Our apologies if you have received multiple copies of this announcement.
*****************************************************************
ELRA - Language Resources Catalogue - Update
*****************************************************************
ELRA is happy to announce that 4 new Written Corpora are now available
in its catalogue.
*
ELRA-W0059 LT Corpus
*The LT Corpus is composed of 70 fiction texts from Portuguese renowned
authors. The corpus contains 1,781,083 tokens. The texts date from
before 1940. The corpus is delivered in one file, in two different
formats. The txt version has one sentence per line, an identification
number for each text and no further annotation. The cqpweb file is one
token per line, followed by pos tag and lemma, and is annotated for NP
chunks.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1178
*ELRA-W0060 PTPARL Corpus
*The PTPARL Corpus contains 1,076 texts consisting of adapted
transcriptions of the Portuguese Parliament sessions. The corpus
contains 1,000,441 tokens. The corpus is delivered in one file, in two
different formats. The txt version has one sentence per line, an
identification number for each text and no further annotation. The
cqpweb file is one token per line, followed by pos tag and lemma, and is
annotated for NP chunks.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1179
*ELRA-W0061 CINTIL-DependencyBank
*The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of
sentences annotated with their syntactic dependency graphs and
grammatical function tags composed of 10,039 sentences and 110,166
tokens taken from different sources and domains: news (8,861 sentences;
101,430 tokens), novels (399 sentences; 3,082 tokens). In addition,
there are 779 sentences (5,654 tokens) that are used for regression
testing of the computational grammar that supported the annotation of
the corpus.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1180
*ELRA-W0062 CINTIL-DeepBank
*The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences
annotated with their full-fledged deep grammatical representations,
composed of 10,039 sentences and 110,166 tokens taken from different
sources and domains: news (8,861 sentences; 101,430 tokens), and novels
(399 sentences; 3,082 tokens). In addition, there are 779 sentences
(5,654 tokens) used for regression testing of the computational grammar
that supported the annotation of the corpus.
For more information, see:
http://catalog.elra.info/product_info.php?products_id=1181
For more information on the catalogue, please contact Valérie Mapelli
mailto:mapelli at elda.org
Visit our On-line Catalogue: http://catalog.elra.info
Visit the Universal Catalogue: http://universal.elra.info
Archives of ELRA Language Resources Catalogue Updates:
http://www.elra.info/LRs-Announcements.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121212/b9e0600c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list