[Corpora-List] Criteria for an ESP Vocabulary List

Emiliano Guevara emiliano.guevara at unibo.it
Thu Apr 24 06:10:37 UTC 2008


Hi,

unfortunately I don't have a very good answer to your question. Only  
a couple of remarks:

On 24 Apr 2008, at 07:41, True Friend wrote:

> Keyword generators work on the basis of frequency i.e. antconc and  
> wordsmith tools etc. They generate a list by comparing with  
> reference corpus a list of words having more frequency in  
> specialized corpus and less in reference corpus.

it's not just frequency, some statistical test of significance must  
be applied, and this is a crucial step. WSTools and AnConc use Chi- 
square and/or Log-likelihood to calculate this.

> Frequency basis is fine but Range has its importance i.e. if a word  
> is most frequent but used only in 10 files is less important then a  
> less frequent word found in more files.

I'm not sure that is called range, but anyway, I think the function  
"clumps" in WSTools 5 may be what you want to find words that are  
very frequent in just a few documents.

And, of ciurse, if you make your own program you can do exactly as  
you want!!!

E.




****************************************
Emiliano R. Guevara
Facoltà di Lingue e Lett. Straniere
Dip. di Lingue e Lett. Straniere
Università di Bologna
Via Cartoleria 5 (40124) Bologna, Italia

Homepage: http://morbo.lingue.unibo.it/

E-mail:   emiliano.guevara at unibo.it
           emiguevara at gmail.com
****************************************


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list