Corpora: Help on Frequencies
Pascual Cantos
pcantos at fcu.um.es
Fri Oct 5 09:40:59 UTC 2001
Dear List Members,
Many corpus-based applications on foreign language materials and dictionary
making, among other, mostly rely on raw frequencies (absolute and/or
relative frequencies) of word forms, lemmas, bi-grams, etc. Frequencies
indices are taken into account in order to decide whether an item should be
considered or not.
And here are my doubts:
What do frequencies exactly tell?
And more interesting, what do they hide?
How misleading/erroneous can they be?
How far can we rely on them?
What other features/aspects/measures should also be considered?
Are there ways/techniques to "correct" frequencies indices, statistically?
I would most appreciate ideas, comments and literature on this issue.
I do also promise to send a summary of all mails received.
Un saludo y un millón de gracias
Pascual
-----------------------------------------------------
Dr. Pascual Cantos Gómez
Departamento de Filología Inglesa
Universidad de Murcia
C/. Santo Cristo, 1
30071 Murcia (Spain)
Tel.: +34 968 364365
Fax: +34 968 363185
E-mail: pcantos at fcu.um.es
http://www.um.es/lacell/miembros/pcg/
More information about the Corpora
mailing list