<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.3020" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV>Many thanks to everyone who responded to my recent query about free online
corpora. Here is a summary of the responses I have received: </DIV>
<DIV> </DIV>
<DIV>Jenny in Hong Kong directed me to the Hong Kong Polytechnic
University's Virtual Language Centre <A
href="http://vlc.polyu.edu.hk/">http://vlc.polyu.edu.hk/</A>, which takes you to
a concordancer with different corpora.</DIV>
<DIV> </DIV>
<DIV dir=ltr><FONT color=#000000>Lene Petersen highlighted the KEMPE <EM>Korpus
of Early Modern Playtexts in English</EM> which is available to search free of
charge via <A
href="http://corp.hum.sdu.dk/cqp.en.html">http://corp.hum.sdu.dk/cqp.en.html</A>. "The VISL
site also hosts wikipedia and chat corpora that are password
free."</FONT></DIV>
<DIV><BR>Jörg Tiedemann pointed me to the OPUS collection of parallel corpora
(including English). There is an on-line search interface at <A
href="http://logos.uio.no/cgi-bin/opus/opuscqp.pl">http://logos.uio.no/cgi-bin/opus/opuscqp.pl</A>,
and another (hidden) search interface for Europarl with some more features:
<A
href="http://logos.uio.no/opus/EUROPARL/frames-cqp.html">http://logos.uio.no/opus/EUROPARL/frames-cqp.html</A></DIV>
<DIV> </DIV>
<DIV>Elzbieta Dura <FONT color=#000080>mentioned<SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><FONT
face="Times New Roman" size=3> </FONT><A href=""><FONT
face="Times New Roman"
size=3>http://bergelmir.iki.his.se/culler/</FONT></A><FONT size=3><FONT
face="Times New Roman"> <FONT color=#000000>where there are a number of corpora
in biomedicine and also an English-Swedish JRC-Acquis parallel corpus. At <A
href="http://www.nla.se.culler">http://www.nla.se.culler</A> there is a corpus
of older English. She also noted that comments on the corpus tool Culler
are welcome.</FONT></FONT></FONT></SPAN></FONT><FONT color=#000000><FONT
face="Times New Roman"><FONT size=3><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">
</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT color=#000000><FONT face="Times New Roman"><FONT size=3><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"></SPAN> </DIV></FONT></FONT></FONT>
<DIV>Michaela Geierhos said: "Perhaps you are already aware of Mark
Davies's TIME corpus. He provides an web interface to do basic KWIC, collocates,
n-gram searches, etc. TIME corpus (new May 2007; 100m words; US 1900s) <A
href="http://view.byu.edu/timemag"
target=_blank>http://view.byu.edu/timemag</A>. Another quite useful thing is
GlossaNet. It's <FONT style="COLOR: rgb(0,0,0)" color=#cccccc>a search engine
that gives you daily access to the online editions of more than 100 newspapers
in 12 languages. </FONT><SPAN><A href="http://glossa.fltr.ucl.ac.be/"
target=_blank>http://glossa.fltr.ucl.ac.be/</A>. </SPAN>It requires
registration for intensive use because it's possible to get the concordances of
all chosen newspapers daily or weekly etc. by e-mail. You can also take a look
at the system before registering: <SPAN><A
href="http://glossa.fltr.ucl.ac.be/scripts/gtoday/gtoday.pl"
target=_blank>http://glossa.fltr.ucl.ac.be/scripts/gtoday/gtoday.pl</A>. </SPAN>There
you'll see an overview of all accessible newspapers by language."<BR></DIV>
<DIV> </DIV>
<DIV>Eckhard Bick highlights the English section of Corpus Eye (at <A
href="http://corp.hum.sdu.dk">http://corp.hum.sdu.dk</A>), which contains a
number of further online corpora (all morphologically and syntactically
annotated and searchable), of which the following are
password-free: Europarl corpus (25.7 mill. words); Wikipedia corpus
(115 mill. words); Chat corpus (23.5 mill. words); KEMPE Shakespeare
corpus (8.9 mill. words); Enron e-mail corpus (75 mill. words)</DIV>
<DIV> </DIV>
<DIV>Ana Frankenberg directed me to the COMPARA corpus, a 3 million-word
bidirectional parallel corpus of English and Portuguese. "People can use just
the English (or just the Portuguese) side of the corpus if they wish. The corpus
is online, free and requires no registration. See <A
href="">http://www.linguateca.pt/COMPARA/Welcome.html</A>"<BR></DIV>
<DIV> </DIV>
<DIV>Elisa Duarte Teixeira and Stella Tagnin told me that "the English part
of the CorTec corpus, a Portuguse-English technical comparable corpus,
which is part of the COMET Project (Multiligual Corpora for Teaching and
Translation), can be freely searched at this address: (<A
href="">http://www.fflch.usp.br/dlm/comet/consulta_cortec.html</A>). Although
the English version of the site is not finished, there you'll find the
documentation that explains the composition of the 5 corpora in English. Soon,
all the 5 corpora will receive more texts and new areas will be added
- we'll announce it here, when it's ready." Stella Tagnin also
pointed out a monolingual Brazilian Portuguese Corpus - Lácio-Web, at <A
href="http://www.nilc.icmc.usp.br/lacioweb">www.nilc.icmc.usp.br/lacioweb</A>.
</DIV>
<DIV> </DIV>
<DIV>Huaqing Hong suggested the SCoRE corpus at: <A
href="http://score.crpp.nie.edu.sg/">http://score.crpp.nie.edu.sg/</A>. You can
register online to try the demo version. <BR></DIV>
<DIV>Ilya at the Linguistic Data Consortium directed me to: <A
class=moz-txt-link-freetext
href="https://online.ldc.upenn.edu/login.html">https://online.ldc.upenn.edu/login.html</A> to
sign up for a guest account to LDC Online. "With a guest account, you can search
a subset of English newstext the LDC has acquired, as well as search and listen
to English telephone conversations. The American English Spoken Lexicon is
also included."<BR></DIV>
<DIV> </DIV>
<DIV>Stefan Bordag suggested I look at corpora.uni-leipzig.de, which contains an
English corpus as well as others and is freely accessible online, as well as
downloadable. </DIV>
<DIV> </DIV>
<DIV>Ralf Steinberger highlighted the 55 million word <SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><FONT
face="Times New Roman" color=#000000 size=3>English part of the multilingual
parallel corpus JRC-Acquis. "The overall corpus, including all 22 languages,
consists of over 1 Billion words. </FONT></SPAN><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><FONT
face="Times New Roman" size=3><FONT color=#000000>You cannot search the corpus
via a web interface, but you can simply download the JRC-Acquis documents from
the site</FONT><FONT color=navy> </FONT><A href=""><FONT face="Times New Roman"
size=3>http://langtech.jrc.it/JRC-Acquis.html</FONT></A><FONT
face="Times New Roman" size=3>."</FONT></SPAN></FONT><BR></DIV>
<DIV> </DIV>
<DIV>For completeness, here are the corpora I included in my first message:
</DIV>
<DIV> </DIV>
<DIV>BNC (<A href="">http://www.natcorp.ox.ac.uk/</A>)<BR>VIEW interface to the
BNC (<A href="">http://view.byu.edu/</A>)<BR>COBUILD Corpus Concordance
Sampler (<A
href="http://www.collins.co.uk/corpus/CorpusSearch.aspx">http://www.collins.co.uk/corpus/CorpusSearch.aspx</A>)<BR>SCOTS
(<A href="">http://www.scottishcorpus.ac.uk</A>)<BR>ELISA (<A
href="">http://www.uni-tuebingen.de/elisa/html/elisa_index.html</A>)<BR>Compleat
Lexical Tutor (access to Brown and BNC sampler among others) (<A
href="http://www.lextutor.ca/">http://www.lextutor.ca/</A>)<BR>Virtual Language
Centre Web Concordancer (access to Brown, LOB among others) (<A
href="">http://www.edict.com.hk/default.htm</A>)<BR>IViE Corpus (<A
href="">http://www.phon.ox.ac.uk/IViE/</A>)<BR>Speech Accent Archive (<A
href="http://accent.gmu.edu/">http://accent.gmu.edu/</A>)<BR></DIV>
<DIV> </DIV>
<DIV>thanks again!</DIV>
<DIV> </DIV>
<DIV>Wendy</DIV>
<DIV>....................<BR>Dr Wendy J Anderson<BR>Scottish Corpus of Texts and
Speech<BR>Department of English Language<BR>University of Glasgow<BR>12
University Gardens<BR>Glasgow<BR>G12 8QQ<BR>Scotland, UK</DIV>
<DIV> </DIV>
<DIV>Website: <A
href="http://www.scottishcorpus.ac.uk">http://www.scottishcorpus.ac.uk</A><BR></DIV></BODY></HTML>