[Corpora-List] Criteria for an ESP Vocabulary List
Emiliano Guevara
emiliano.guevara at unibo.it
Thu Apr 24 06:10:37 UTC 2008
Hi,
unfortunately I don't have a very good answer to your question. Only
a couple of remarks:
On 24 Apr 2008, at 07:41, True Friend wrote:
> Keyword generators work on the basis of frequency i.e. antconc and
> wordsmith tools etc. They generate a list by comparing with
> reference corpus a list of words having more frequency in
> specialized corpus and less in reference corpus.
it's not just frequency, some statistical test of significance must
be applied, and this is a crucial step. WSTools and AnConc use Chi-
square and/or Log-likelihood to calculate this.
> Frequency basis is fine but Range has its importance i.e. if a word
> is most frequent but used only in 10 files is less important then a
> less frequent word found in more files.
I'm not sure that is called range, but anyway, I think the function
"clumps" in WSTools 5 may be what you want to find words that are
very frequent in just a few documents.
And, of ciurse, if you make your own program you can do exactly as
you want!!!
E.
****************************************
Emiliano R. Guevara
Facoltà di Lingue e Lett. Straniere
Dip. di Lingue e Lett. Straniere
Università di Bologna
Via Cartoleria 5 (40124) Bologna, Italia
Homepage: http://morbo.lingue.unibo.it/
E-mail: emiliano.guevara at unibo.it
emiguevara at gmail.com
****************************************
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list