16.1366, Sum: WebCorpus Counts

LINGUIST Network linguist at linguistlist.org
Fri May 6 16:42:00 UTC 2005


LINGUIST List: Vol-16-1366. Fri Apr 29 2005. ISSN: 1068 - 4875.

Subject: 16.1366, Sum: WebCorpus Counts

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Dooley, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Jessica Boynton <jessica at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================

1)
Date: 28-Apr-2005
From: Jerry Kurjian < jkurjian at mail.sdsu.edu >
Subject: WebCorpus Counts


-------------------------Message 1 ----------------------------------
Date: Fri, 29 Apr 2005 11:26:50
From: Jerry Kurjian < jkurjian at mail.sdsu.edu >
Subject: WebCorpus Counts


Regarding query: http://www.linguistlist.org/issues/16/16-1291.html#1

Below I summarize the comments of Andrew Kehoe and Antoinette Renouf
(5/27/2005), two of the creators of WebCorp, who kindly replied to my query
concerning WebCorp in thread 16.1291 and on Corpora list (corpora AT uib.no):

Within a webpage, WebCorp will gather as many kwics per page as there
exist, if the ''one hit per page'' option is not checked. Across webpages,
WebCorp only gathers hits from up to 200 webpages.  Getting fewer than 200
hits might mean that you have chosen to filter some out features out, that
some of the 200 webpages were not accessible to WebCorp or had change, or
that there are fewer than 200 pages that have the search term.

Finally, the authors say they are continuing to upgrade WebCorp, and in an
upcoming version plan to add frequency counts, type/token ratios,
collocation profiles, and ''other statistics.''

Linguistic Field(s): Text/Corpus Linguistics





-----------------------------------------------------------
LINGUIST List: Vol-16-1366






----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the LINGUIST mailing list