[Corpora-List] Spanish reference corpus
Adam Kilgarriff
adam at lexmasterclass.com
Fri Feb 2 07:54:00 UTC 2007
Mario,
Yes, the frequencies etc are available for this corpus via the Sketch
Engine, a corpus query tool which allows the user to specify and collect
frequency lists to a wide range of specifications (as well as offering a
range of other functions including concordancing, 'word sketches' and a
distributional thesaurus).
We have taken the URL list as supplied by Serge Sharoff, re-collected the
corpus (or, at least, a 95% similar corpus) and installed it into the Sketch
Engine. Self-registration for trial account at
http://www.sketchengine.co.uk
Enjoy!
Adam
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Mario Crespo Miguel
Sent: 01 February 2007 13:17
To: s.sharoff at leeds.ac.uk
Cc: corpora at lists.uib.no
Subject: Re: [Corpora-List] Spanish reference corpus
Thank you very much for helping me, but I think it is more
convenient for me if the frequencies of the words of this open
domain / general corpus could be obtained. Does anybody know if
such an information is available some way? Best,
Mario
El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff at leeds.ac.uk>
escribió:
> one answer is the Spanish Internet corpus with the interface from
> http://corpus.leeds.ac.uk/internet.html
> and the URL list
> http://corpus.leeds.ac.uk/internet/final-url-es.gz
>
> This is a random snapshot of the Spanish Internet of about 120
> million
> words, see
> Sharoff, S (2006) Creating general-purpose corpora using
> automated
> search engine queries. In Marco Baroni and Silvia Bernardini,
> editors,
> WaCky! Working papers on the Web as Corpus. Gedit, Bologna.
> http://wackybook.sslmit.unibo.it/
>
> S
>
> On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote:
>> Dear everybody,
>>
>> Thank you again for all the help that I always get with this
>> mailing list, and this time I would like to ask if there is
>> some reference / open-domain corpus for Spanish which is freely
>> available and could be downloaded. Thank you in advance. Best
>> wishes,
>>
>> Mario Crespo Miguel
>>
>>
>
>
More information about the Corpora
mailing list