<html>
I'm trying to create a listing of HISTORICAL corpora for languages
besides Spanish (I already have that), and in addition to those on the
ICAME CD-ROM, which includes The Helsinki Corpus of English Texts, The
Helsinki Corpus of Older Scots, Corpus of Early English Correspondence,
The Newdigate Newsletters, Lampeter Corpus, Innsbruck Computer-Archive of
Machine-Readable English Texts (ICAMET) [see
<a href="http://www.hd.uib.no/icame/newcd.htm" eudora="autourl">http://www.hd.uib.no/icame/newcd.htm</a>]<br>
<br>
Here's a listing of what I have so far: <br>
<br>
Language / Name / URL / Approx. time period / Approx. size<br>
<br>
1) English / Penn-Helsinki Parsed Corpus of Middle English /
<a href="http://www.ling.upenn.edu/mideng/" eudora="autourl"><font face="Arial, Helvetica">http://www.ling.upenn.edu/mideng/</a>
1150-1500 / 1,200,000 words [based on the Helsinki corpus]<br>
<br>
2) English / </font>Penn-Helsinki Parsed Corpus of Old English<font face="Arial, Helvetica">/ Info at <a href="http://linguistlist.org/issues/10/10-1956.html" eudora="autourl">http://linguistlist.org/issues/10/10-1956.</a><a href="http://linguistlist.org/issues/10/10-1956.html" eudora="autourl">html</a> / 850-1150 / 420,000 words [based on the Helsinki corpus]<br>
<br>
3) French / </font>ARTFL (<font face="Arial, Helvetica">Trésor de la langue française) / <a href="http://humanities.uchicago.edu/ARTFL/artfl.flyer.html" eudora="autourl">http://humanities.uchicago.edu/ARTFL/artfl.flyer.</a><a href="http://humanities.uchicago.edu/ARTFL/artfl.flyer.html" eudora="autourl">html</a> / 1600 > / 115,000,000 words<br>
<br>
4) Swedish / </font>Projektet Källtext / <a href="http://spraakdata.gu.se/ktext/" eudora="autourl"><font face="Arial, Helvetica">http://spraakdata.gu.se/ktext/</a> ???? / 2,000,000 words<br>
<br>
5) German / Projekt Gutenberg / <a href="http://gutenberg.aol.de/info/projekt.htm" eudora="autourl">http://gutenberg.aol.de/info/projekt.htm</a> / Mostly 1900s, but a few earlier / 300 texts (# words ??)<br>
<br>
6) Portuguese / </font>Tycho Brahe Parsed Corpus of Historical Portuguese / <a href="http://www.ime.usp.br/~tycho/" eudora="autourl"><font face="Arial, Helvetica">http://www.ime.usp.br/~tycho/</a> / c1600-1900 / Goal of 1,000,000 words<br>
<br>
7) Chinese / </font>Historical Corpora for Synchronic and Diachronic Linguistics Studies / <a href="http://rocling.iis.sinica.edu.tw/CLCLP/Vol2-1/a6.htm" eudora="autourl"><font face="Arial, Helvetica">http://rocling.iis.sinica.edu.tw/CLCLP/Vol2-1/a6.htm</a> / Pre-Qin to Chang dynasties (time period??) / 17,000,000 characters<br>
<br>
</font>As can be seen, I haven't yet identified many HISTORICAL corpora for German, Dutch, Norwegian, Icelandic, Italian, Romanian, Hungarian, Finnish, any of the Slavic languages, or any of the other European languages. In addition the only non-European language for which I can find anything is Chinese. (Also, I know that there are/must be nice collections of classical Greek and Latin in electronic form and on the Web [due to the large number of classical texts] but I haven't compiled a list of these yet). <br>
<br>
<font face="Arial, Helvetica">At any rate, if anyone has information on other historical corpora for the desired languages, I'd appreciate your sending me a URL for the resources. I will be creating a webpage with links to the historical corpora and will announce this on CORPORA in about a week, when I've received feedback from others. <br>
<br>
Thanks in advance for your help.<br>
<br>
Mark Davies<br>
<br>
<br>
</font><br>
<div>=======================================</div>
<div>Mark Davies, Associate Professor, Spanish Linguistics</div>
<div>Dept. of Foreign Languages, Illinois State University</div>
<div>Normal, IL 61790-4300</div>
<br>
<div>Voice:309/438-7975 email:mdavies@ilstu.edu</div>
<div>Fax:309/438-8038 <a href="http://mdavies.for.ilstu.edu/personal/" EUDORA=AUTOURL>http://mdavies.for.ilstu.edu/personal/</a></div>
=======================================
</html>