[Corpora-List] Query: free online corpora

Ralf Steinberger ralf.steinberger at jrc.it
Wed May 30 14:19:53 UTC 2007


The English part of the multilingual parallel corpus JRC-Acquis comprises of
55 Million words. The overall corpus, including all 22 languages, consists
of over 1 Billion words.

 

You cannot search the corpus via a web interface, but you can simply
download the JRC-Acquis documents from the site
http://langtech.jrc.it/JRC-Acquis.html. 

 

Ralf

 

 

 

Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)

European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it,  <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer) 





 <http://langtech.jrc.it/#Publications> The JRC's Language Technology group
specialises in the development of highly multilingual text analysis tools
and in cross-lingual applications. An example is our multilingual (19
languages) news analysis application NewsExplorer, publicly accessible at
http://press.jrc.it/NewsExplorer. 

 

 <http://press.jrc.it/NewsExplorer> Related JRC developments (both covering
22+ languages):

-          NewsBrief ( <http://press.jrc.it/> http://press.jrc.it): breaking
news detection and display of the very latest thematic news from around the
world;

-          Medical Information System MedISys ( <http://medusa.jrc.it/>
http://medusa.jrc.it): displays the latest health-related news from around
the world according to themes and diseases.

 

 

 

From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Dr Wendy Anderson
Sent: 29 May 2007 13:56
To: CORPORA at UIB.NO
Subject: [Corpora-List] Query: free online corpora

 

Dear all, 

I'm looking for online corpora of English which can be accessed and searched
online, free of charge (either requiring registration or not). I'm
interested in all varieties, times periods, genres, modes, levels of
annotation, etc., and it doesn't matter whether a search returns all data or
only a sample. 

I'd be very grateful if Corpora members could let me know of resources which
fit these criteria and are not already listed at the foot of this message.
I'll post a summary of responses in due course. 

 

thank you!

Wendy

 

I'm aware of the following corpora: 

BNC (http://www.natcorp.ox.ac.uk/) 

VIEW interface to the BNC (http://view.byu.edu/)

COBUILD Corpus Concordance Sampler
(http://www.collins.co.uk/corpus/CorpusSearch.aspx)

SCOTS (http://www.scottishcorpus.ac.uk)

ELISA (http://www.uni-tuebingen.de/elisa/html/elisa_index.html)

Compleat Lexical Tutor (access to Brown and BNC sampler among others)
(http://www.lextutor.ca/)

Virtual Language Centre Web Concordancer (access to Brown, LOB among others)
(http://www.edict.com.hk/default.htm)

IViE Corpus (http://www.phon.ox.ac.uk/IViE/)

Speech Accent Archive (http://accent.gmu.edu/)

....................
Dr Wendy J Anderson
Scottish Corpus of Texts and Speech
Department of English Language
University of Glasgow
12 University Gardens
Glasgow
G12 8QQ
Scotland, UK

 

Website: http://www.scottishcorpus.ac.uk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070530/4d470f02/attachment.htm>


More information about the Corpora mailing list