8.930, Sum: Frequency

linguist at linguistlist.org linguist at linguistlist.org
Thu Jun 26 17:10:55 UTC 1997


LINGUIST List:  Vol-8-930. Thu Jun 26 1997. ISSN: 1068-4875.

Subject: 8.930, Sum: Frequency

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>

Review Editor:     Andrew Carnie <carnie at linguistlist.org>

Associate Editors: Ljuba Veselinova <ljuba at linguistlist.org>
                   Ann Dizdar <ann at linguistlist.org>
Assistant Editor:  Sue Robinson <sue at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Zhiping Zheng <zzheng at online.emich.edu>

Home Page:  http://linguistlist.org/


Editor for this issue: Ann Dizdar <ann at linguistlist.org>

=================================Directory=================================

1)
Date:  Mon, 16 Jun 1997 11:15:56 +0100
From:  Marcial.Terradez at uv.es
Subject:  sum:frequency

-------------------------------- Message 1 -------------------------------

Date:  Mon, 16 Jun 1997 11:15:56 +0100
From:  Marcial.Terradez at uv.es
Subject:  sum:frequency

Some weeks ago, I made a query on linguist list about frequency
vocabularies on English, French, German and Spanish. Many people
responded with helpful comments, which are summarised below.

Thanks to everybody who wrote to me. Your suggestions and information
are very important for my work.


My name is Erik Willis and I attend Brigham Young University as a
Masters student in Spanish.  One of our professors is very active in
frequency counts, his name is Orlando Alba. (Orlando_Alba at byu.edu) I
know his teacher Humberto Lopez Morales was very active in that field
also.  Their respective corpora are based on the Dom. Rep. Puerto Rico
and I believe Mexico and were based on availability (lexico
disponible). Hasta ahora no creo que tengan algo en el net.  El que
mejor conoce los recursos del net el Francisco Marcos Marin en la
Autonoma de Madrid.  No tengo su e-mail.  I am also working with
frequency counts but at a phonological level.  I am looking at written
and oral narratives which I believe has not been done.  Ojala podamos
ayudarnos mutuamente con bibliografias etc.

Erik Willis
willisew at itsnet.com

- -------------------------------------------------
Estimado Marcial:
	Hay varios recuentos existentes ya, entre ellos:

	Helen Eaton, ca. 194?. (Me olvido del ti'tulo, pero es algo
como: Frecuency counts in 5 European languages.  No se' quie'n lo
publico' originariamente, pero la Dover Press lo volvio' a publicar en
'paperback' por eso de los 60s o 70s.
	Luis Fernando Lara en el Colegio de Me'xico ha hecho mucho en
este sentido (con base en textos seleccionados de un total de [creo] 2
millones de palabras de texto corrido).  E'l esta' en el DEM
[diccionario del espan~ol de Me'xico], y actualmente es el director
del CELL [centro de estudios de lingu"i'stica y literatura] de El
Colegio de Me'xico (e-mail: lara at colmex.mx, aunque no estoy 100%
seguro del prefijo).  E'l te puede asesorar mucho al respecto.
Tb. hay muchos investigadores del ana'lisis de corpus en la propia
Espan~a, aunque no me acuerdo en estemomento de sus nombres.
	Yo a mediano plazo emprendere' un proyecto con propo'sito
similar, pero con un corpus de gigapalabras, para poder investigar el
uso de formas de palabras (por ej., el futuro del subjuntivo, etc.)
con algo de detalle, asi' como los nombres propios, etc.  Sin embargo,
no tengo mucho hecho al respecto hasta la fecha.

Jim
James L. Fidelholtz			e-mail: jfidel at siu.cen.buap.mx
A'rea de Ciencias del Lenguaje		o:	jfidel at cca.pue.udlap.mx
Instituto de Ciencias Sociales y Humanidades
Universidad Auto'noma de Puebla, Me'xico
- -------------------------------------------------------------
Estimado Marcial,

Un colega mio de la Universidad de Oviedo acaba de publicar un
diccionario de frecuencias del castellano.  Su direccion es: Jose
Ramon Alameda <jalameda at sci.cpd.uniovi.es> En cuanto al diccionario
que Ud. va a recopilar, Ud. piensa etiquetear las palabras.  Es
decir, va a distinguir entre en numero de casos de 'casa' que son del
sustantivo 'casa' y los que viene del verbo 'casar'?  -
--------------------------------------------------
David Eddington
Mississippi State University

I used two frequency lists in research I
conducted almost 20 years ago: one is the Keniston List, 2000 words
divided into groups of 500 for frequency of words in print in
Peninsular Spanish. The other is Rodriquez and Bou for frequency of
words in print for Puerto Rican Spanish.

Joel Walters
Department of English
Bar-Ilan University Ramat Gan, Israel -
----------------------------------------------------

I produced the frequency list for Longman's Dictionary.  Both the
paper and assorted frequency lists are available from my web page (see
below).

If you have troubel accessign the paper, feel free to email me again
and I'll send it,

	Happy surfing,

		Adam

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff                               =20
Senior Research Fellow                         tel: (44) 1273 642919    =20
Information Technology Research Institute           (44) 1273 642900=20
University of Brighton                         fax: (44) 1273 642908
Lewes Road                       =20
Brighton BN2 4GJ         email:      Adam.Kilgarriff at itri.bton.ac.uk
UK                       http://www.itri.bton.ac.uk/~Adam.Kilgarriff
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- -------------------------------------------------------------------
Entra por ftp anonimo en ftp-lsi.upc.es
 cambia al directorio pub/lluisp

  alli encontraras los ficheros

    spanish.freq      (frecuencias de palabras en espa=F1ol
			sacadas de un corpus de 3M de palabras)

    wsj.freq     (frecuencias de palabras en ingles sacadas
                   de 1.1M de palabras del WSJ)

  tienes que uudecodear y gzunzipar los ficheros


	suerte

		Lluis Padro
- -----------------------------------------------
Hola Marcial:

Aunque es muy probable que ya las tengas, te envio las referencias
que tengo a mano sobre frecuencias lexicas del castellano, por si
te pueden ayudar:

PATTERSON, William; y URRUTIBEHEITY, Hector, _The Lexical Structure of
Spanish_, Mouton, La Haya-Par=EDs, 1975.=20

JUILLAND, Alphonse; y CHANG-RODRIGUEZ, Eugenio, _Frequency dictionary
of Spanish words_, Mouton, Londres-La Haya-Par=EDs, 1964.

PATTERSON, William T., "On the genealogical structure of the Spanish
vocabulary",en ???, pp. 309-339.

GARCIA HOZ, Victor, _Estudios experimentales sobre el vocabulario_,
CSIC, Madrid, 1977.

______________________
Javier Gomez Guinovart  <uvifejgg at cesga.es>
http://www.uvigo.es/departamentos/dep/h06/webh06/sli/index.html
Univ. de Vigo - Fac. de Humanidades - Apartado 874 - E-36200 Vigo
Tel: +34+86+812360 - Fax: +34+86+812380
- ---------------------------------------------------------
I have a copy of:

An English-French-German-Spanish Word Frequency Dictionary Subtitle: A
correlation of the first 6000 words in four single-language frequency
lists
Compiled by Helen S. Eaton, Teachers College, Columbia Univ;
   visiting instructor, Univ of New Mexico;
   Diplomee, Sorbonne, Universite de Paris
441 pages, paperback, Dover Publications, Inc, New York.
copyright 1940, 1967 by Helen S. Eaton

There are separate indexes for English, French, German and Spanish
words.
Appendix II is a conceptual analysis of substantives, verbs and
adjectives
in the list.

Pub in Canada by General Publ Co Ltd, 30 Lesmill Road, Don Mills,
Toronto, Ontario
Pub in UK by Constable and Co, Ltd, 10 Orange St, London, W.C. 2
Pub in US by Dover Publications Inc, 180 Varick St, New York, NY 10014
LCCN: 61-4487

/s/ Israel Cohen
New Dimension Software Ltd
izzy at telaviv.ndsoft.com

---------------------------------------------------------------------------
LINGUIST List: Vol-8-930



More information about the LINGUIST mailing list