23.2276, FYI: GerManC Corpus is Now Available
linguist at linguistlist.org
linguist at linguistlist.org
Fri May 11 16:10:57 UTC 2012
LINGUIST List: Vol-23-2276. Fri May 11 2012. ISSN: 1069 - 4875.
Subject: 23.2276, FYI: GerManC Corpus is Now Available
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Veronika Drake, U of Wisconsin-Madison
Monica Macaulay, U of Wisconsin-Madison
Rajiv Rao, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
<reviews at linguistlist.org>
Homepage: http://linguistlist.org
The LINGUIST List is a non-profit organization dedicated to providing the
discipline of linguistics with the infrastructure necessary to function in
the digital world. Donate to keep our services freely available!
https://linguistlist.org/donation/donate/donate1.cfm
Editor for this issue: Brent Miller <brent at linguistlist.org>
================================================================
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
http://multitree.linguistlist.org/
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.cfm.
Date: Fri, 11 May 2012 12:10:51
From: Richard Whitt [jasonwhitt at mindspring.com]
Subject: GerManC Corpus is Now Available
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-2276.html&submissionid=4546326&topicid=6&msgnumber=1
The complete GerManC Corpus, a representative corpus of Early
Modern German from 1650 to 1800, is now publicly available at the
Oxford Text Archive:
http://www.ota.ox.ac.uk/desc/2544
Following the model of the ARCHER corpus and given the aim of
representativeness, the GerManC corpus consists of text samples of
about 2000 words from eight genres: drama, newspapers, sermons
and personal letters (to represent orally oriented registers) and
narrative prose (fiction or non-fiction), scholarly (i.e. humanities),
scientific and legal texts (to represent more print-oriented registers). In
order to facilitate tracing historical developments, the whole period was
divided into fifty year sections (in this case 1650-1700, 1700-1750 and
1750-1800), and an equal number of texts from each genre was
selected for each of these sub-periods.
The complete corpus thus consists of 360 samples, comprising
approximately 800,000 words. Appendix 1 in the download package
contains a lists of the files in the corpus with full documentation in an
Excel spreadsheet.
Project Team: Martin Durrell (PI), Paul Bennett (Co-Investigator), Silke
Scheible (RA), Richard J. Whitt (RA), and Astrid Ensslin (RA,
Newspaper Corpus).
Linguistic Field(s): Computational Linguistics
Historical Linguistics
Text/Corpus Linguistics
----------------------------------------------------------
LINGUIST List: Vol-23-2276
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
http://multitree.linguistlist.org/
More information about the LINGUIST
mailing list