[Corpora-List] Spanish reference corpus

Serge Sharoff s.sharoff at leeds.ac.uk
Fri Feb 2 08:24:50 UTC 2007


yes, the frequency list is also available:
http://corpus.leeds.ac.uk/frqc/internet-es-forms.num (for word forms)
http://corpus.leeds.ac.uk/frqc/internet-es.num (for lemmas, though you'd
better take the results of automatic lemmatisation with caution).

BTW, the frequencies (the second column) are in terms of ipm (instances
per million words).

Serge

On Thu, 2007-02-01 at 14:17 +0100, Mario Crespo Miguel wrote:
> Thank you very much for helping me, but I think it is more 
> convenient for me if the frequencies of the words of this open 
> domain / general corpus could be obtained. Does anybody know if 
> such an information is available some way? Best,
> 
> Mario
> 
> 
> 
> El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff at leeds.ac.uk> 
> escribió:
> 
> > one answer is the Spanish Internet corpus with the interface from
> > http://corpus.leeds.ac.uk/internet.html
> > and the URL list 
> > http://corpus.leeds.ac.uk/internet/final-url-es.gz
> > 
> > This is a random snapshot of the Spanish Internet of about 120 
> > million
> > words, see
> > Sharoff, S (2006) Creating general-purpose corpora using 
> > automated
> > search engine queries. In Marco Baroni and Silvia Bernardini, 
> > editors,
> > WaCky! Working papers on the Web as Corpus. Gedit, Bologna.
> > http://wackybook.sslmit.unibo.it/
> > 
> > S
> > 
> > On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote:
> >> Dear everybody,
> >> 
> >> Thank you again for all the help that I always get with this 
> >> mailing list, and  this time I would like to ask if there is 
> >> some reference / open-domain corpus for Spanish which is freely 
> >> available and could be downloaded. Thank you in advance. Best 
> >> wishes,
> >> 
> >> Mario Crespo Miguel
> >> 
> >> 
> > 
> > 
> 
> 
> 
> 



More information about the Corpora mailing list