<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/3.26.0">

</HEAD>

<BODY>

just to add another example:<BR>

<A HREF="http://corpus.leeds.ac.uk/cgi-bin/cqp.pl?c=BNC&mistat=on&cleft=3&cright=3&cfilter=&searchtype=colloc&q=purple%25">http://corpus.leeds.ac.uk/cgi-bin/cqp.pl?c=BNC&mistat=on&cleft=3&cright=3&cfilter=&searchtype=colloc&q=purple%25</A><BR>

<BR>

I guess one reason for the difference is the way we count the number of words.  I use what's returned by CWB, hence this includes all tokens, including punctuation marks.  <BR>

Serge<BR>

<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">

<TR>

<TD>

<BR>

</TD>

</TR>

</TABLE>

<BR>

On Tue, 2009-09-29 at 00:03 +0100, Mark Davies wrote:<BR>

<BLOCKQUOTE TYPE=CITE>

    One other page on similarities / discrepancies between MI scores from different corpus architectures and interfaces:<BR>

    <BR>

    <A HREF="http://corpus.byu.edu/collocates.asp">http://corpus.byu.edu/collocates.asp</A><BR>

    <BR>

    As before, notice that even the outliers here are still "in the same ballpark", as opposed to other approaches that yield MI scores of 100 or more.<BR>

    <BR>

<PRE>

MD


============================================

Mark Davies

Professor of (Corpus) Linguistics

Brigham Young University

(phone) 801-422-9168 / (fax) 801-422-0906


<A HREF="http://davies-linguistics.byu.edu">http://davies-linguistics.byu.edu</A>


** Corpus design and use // Linguistic databases **

** Historical linguistics // Language variation **

** English, Spanish, and Portuguese **

============================================ 


_______________________________________________

Corpora mailing list

<A HREF="mailto:Corpora@uib.no">Corpora@uib.no</A>

<A HREF="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</A>

</PRE>

</BLOCKQUOTE>

</BODY>

</HTML>