John,<div><br></div><div>this is less a bug than a knotty tokenisation problem. For most linguistic purposes it is appropriate to tokenize <i>cannot</i> as two words, so that's what we have done. Can't please all the people all the time ...</div>
<div><br></div><div>Adam<br><br><div class="gmail_quote">On 2 October 2010 03:26, John F. Sowa <span dir="ltr"><<a href="mailto:sowa@bestweb.net">sowa@bestweb.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Another bug in Looglefight:<br>
<br>
I checked both Googlefight and Looglefight for the occurrences of<br>
'cannot' vs. 'can not'. According to Googlefight, there are about<br>
50 times more occurrences of 'can not' than 'cannot'<br>
<br>
But Looglefight said there were 0 occurrences of 'cannot',<br>
but 13,832 occurrences of 'can not'.<br>
<br>
So I checked the concordance for 'can not' and found that<br>
Looglefight mixed all occurrences of 'cannot' and 'can not'<br>
in the column for 'can not'.<br><font color="#888888">
<br>
John Sowa</font><div><div></div><div class="h5"><br>
<br>
_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>================================================<br>Adam Kilgarriff <a href="http://www.kilgarriff.co.uk">http://www.kilgarriff.co.uk</a> <br>
Lexical Computing Ltd <a href="http://www.sketchengine.co.uk">http://www.sketchengine.co.uk</a><br>Lexicography MasterClass Ltd <a href="http://www.lexmasterclass.com">http://www.lexmasterclass.com</a><br>
Universities of Leeds and Sussex <a href="mailto:adam@lexmasterclass.com">adam@lexmasterclass.com</a><br>================================================<br>
</div>