Spanish frequency counts

Maria Rosa Brea-Spahn mbrea1 at tampabay.rr.com
Thu Nov 11 14:22:59 UTC 2004


Michael,

There are two Spanish searchable databases that are available, though
not easily acquired. I am currently using the LEXESP corpus from the
University of Barcelona, which adheres to some of your listed
requirements. It contains approximately 120,000 words. Syllable
frequency is available in that software program (CDROM), which if you
are interested, you must order directly from Barcelona. The only other
online corpus I know is available is the Alameda and Cuetos corpus. This
program you can acquire by e-mailing Dr. Alameda.

Both of these databases involve Castillian Spanish. In using the
databases, you must also be aware that lemmas and their derived versions
are included on the same list. Therefore, if your intent is to use these
databases to compute the probabilities of sub-syllabic components, you
must clean the database out, otherwise your calculations will be inflated.

Hope this is of some assistance,

Maria R. Brea-Spahn, M.S., CCC-SLP
Doctoral Candidate
Interdisciplinary Ph.D. Psychology and
Communication Sciences and Disorders
University of South Florida
Tampa, FL

Michael Ullman wrote:

>
>
> We are looking for a frequency dictionary/list for Spanish (preferably
> on-line or searchable database).
> We would prefer counts that distinguish different parts of speech, and
> that include
> lemma (root) counts as well as surface frequency counts.
> Ideally the counts would be based on a large corpus with a variety of
> international sources,
> not just from one region.
>
> Can anybody recommend any such a frequency list?
> We would be interested in suggestions even if they do not fit all
> these criteria.
>
> Please email your suggestions to Harriet Wood Bowden at
> woodh at georgetown.edu
>
> Thanks very much,
>
> Harriet Wood Bowden
> Michael Ullman
> Georgetown University
>
>
>



More information about the Info-childes mailing list