[Corpora-List] Surprisingly large MI scores

Serge Sharoff s.sharoff at leeds.ac.uk
Tue Sep 29 08:17:42 UTC 2009


just to add another example:
http://corpus.leeds.ac.uk/cgi-bin/cqp.pl?c=BNC&mistat=on&cleft=3&cright=3&cfilter=&searchtype=colloc&q=purple%25

I guess one reason for the difference is the way we count the number of
words.  I use what's returned by CWB, hence this includes all tokens,
including punctuation marks.  
Serge


On Tue, 2009-09-29 at 00:03 +0100, Mark Davies wrote:

> One other page on similarities / discrepancies between MI scores from
> different corpus architectures and interfaces:
> 
> http://corpus.byu.edu/collocates.asp
> 
> As before, notice that even the outliers here are still "in the same
> ballpark", as opposed to other approaches that yield MI scores of 100
> or more.
> 
> 
> MD
> 
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> 
> http://davies-linguistics.byu.edu
> 
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================ 
> 
> 
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090929/12352c8e/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list