[Corpora-List] Spanish reference corpus

Adam Kilgarriff adam at lexmasterclass.com
Fri Feb 2 07:54:00 UTC 2007


Mario,

Yes, the frequencies etc are available for this corpus via the Sketch
Engine, a corpus query tool which allows the user to specify and collect
frequency lists to a wide range of specifications (as well as offering a
range of other functions including concordancing, 'word sketches' and a
distributional thesaurus).  

We have taken the URL list as supplied by Serge Sharoff, re-collected the
corpus (or, at least, a 95% similar corpus) and installed it into the Sketch
Engine.  Self-registration for trial account at
http://www.sketchengine.co.uk 

Enjoy!

Adam

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Mario Crespo Miguel
Sent: 01 February 2007 13:17
To: s.sharoff at leeds.ac.uk
Cc: corpora at lists.uib.no
Subject: Re: [Corpora-List] Spanish reference corpus

Thank you very much for helping me, but I think it is more 
convenient for me if the frequencies of the words of this open 
domain / general corpus could be obtained. Does anybody know if 
such an information is available some way? Best,

Mario



El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff at leeds.ac.uk> 
escribió:

> one answer is the Spanish Internet corpus with the interface from
> http://corpus.leeds.ac.uk/internet.html
> and the URL list 
> http://corpus.leeds.ac.uk/internet/final-url-es.gz
> 
> This is a random snapshot of the Spanish Internet of about 120 
> million
> words, see
> Sharoff, S (2006) Creating general-purpose corpora using 
> automated
> search engine queries. In Marco Baroni and Silvia Bernardini, 
> editors,
> WaCky! Working papers on the Web as Corpus. Gedit, Bologna.
> http://wackybook.sslmit.unibo.it/
> 
> S
> 
> On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote:
>> Dear everybody,
>> 
>> Thank you again for all the help that I always get with this 
>> mailing list, and  this time I would like to ask if there is 
>> some reference / open-domain corpus for Spanish which is freely 
>> available and could be downloaded. Thank you in advance. Best 
>> wishes,
>> 
>> Mario Crespo Miguel
>> 
>> 
> 
> 



More information about the Corpora mailing list