[Corpora-List] Surprisingly large MI scores
Serge Sharoff
s.sharoff at leeds.ac.uk
Tue Sep 29 08:17:42 UTC 2009
just to add another example:
http://corpus.leeds.ac.uk/cgi-bin/cqp.pl?c=BNC&mistat=on&cleft=3&cright=3&cfilter=&searchtype=colloc&q=purple%25
I guess one reason for the difference is the way we count the number of
words. I use what's returned by CWB, hence this includes all tokens,
including punctuation marks.
Serge
On Tue, 2009-09-29 at 00:03 +0100, Mark Davies wrote:
> One other page on similarities / discrepancies between MI scores from
> different corpus architectures and interfaces:
>
> http://corpus.byu.edu/collocates.asp
>
> As before, notice that even the outliers here are still "in the same
> ballpark", as opposed to other approaches that yield MI scores of 100
> or more.
>
>
> MD
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
>
> http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090929/12352c8e/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list