<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.26.0">
</HEAD>
<BODY>
just to add another example:<BR>
<A HREF="http://corpus.leeds.ac.uk/cgi-bin/cqp.pl?c=BNC&mistat=on&cleft=3&cright=3&cfilter=&searchtype=colloc&q=purple%25">http://corpus.leeds.ac.uk/cgi-bin/cqp.pl?c=BNC&mistat=on&cleft=3&cright=3&cfilter=&searchtype=colloc&q=purple%25</A><BR>
<BR>
I guess one reason for the difference is the way we count the number of words. I use what's returned by CWB, hence this includes all tokens, including punctuation marks. <BR>
Serge<BR>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">
<TR>
<TD>
<BR>
</TD>
</TR>
</TABLE>
<BR>
On Tue, 2009-09-29 at 00:03 +0100, Mark Davies wrote:<BR>
<BLOCKQUOTE TYPE=CITE>
One other page on similarities / discrepancies between MI scores from different corpus architectures and interfaces:<BR>
<BR>
<A HREF="http://corpus.byu.edu/collocates.asp">http://corpus.byu.edu/collocates.asp</A><BR>
<BR>
As before, notice that even the outliers here are still "in the same ballpark", as opposed to other approaches that yield MI scores of 100 or more.<BR>
<BR>
<PRE>
MD
============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
<A HREF="http://davies-linguistics.byu.edu">http://davies-linguistics.byu.edu</A>
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
Corpora mailing list
<A HREF="mailto:Corpora@uib.no">Corpora@uib.no</A>
<A HREF="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</A>
</PRE>
</BLOCKQUOTE>
</BODY>
</HTML>