[Corpora-List] Query on the use of Google for corpus research

Chris Jordan cjordan at cs.dal.ca
Fri May 27 13:05:22 UTC 2005


Oops,

typ-o in the URL of my last. Sorry about that.

http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/abstracts/src-tn-1998-014.html

Chris Jordan wrote:

> Hello,
>
> I would recommend looking at the following reference as it is highly
> related:
> Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael
> Moriez. Analysis of a very large Altavista Query Log. Technical Report
> 1998-014, Digital SRC, 1998.
> http://gatekeeper.dec.com/pub/DEC/SRC/technicalnotes/abstracts/src-tn-1998-014.html
>
>
> There are some interesting issues with regard to examining such data.
> The first that really comes to mind is that you have to be able to
> distinguish between search sessions. This is non-trivial as users
> typically do not have a single goal when searching; there is some work
> by Spink on this topic. Both gathering this query data at the client
> side and at the server side have their own set of problems.
>
> As statistics are being gathered, it is important to discuss
> properties of the user group (sample population) being evaluated.
> Depending on the diversity of the sample (or lack of it) will
> determine what kind of conclusions can be made.
>
> Hope that helps,
>
> Chris
>
> Peter K Tan wrote:
>
>> Just forwarding a question from a colleague. Would be grateful for
>> comments.
>>
>> Cheers,
>> Peter
>>
>>     From: Michelle Maria Lazar
>>     Sent: 27 May 2005 11.27
>>     To: Peter K W Tan; Talib, I S; Vincent Ooi; Wee Hock Ann, Lionel
>>     Subject: Query on the use of Google for corpus research
>>
>>     Hi all,
>>          Someone has written to ask me whether there's any foreseeable
>>     problem/objection in using Google to gather statistical evidence
>>     on particular language usage, using key word searches. It involves
>>     a submission of an article currently under review. Does anyone
>>     have any experience/insight on this?
>>
>>     Cheers,
>>
>>     Michelle
>>
>



More information about the Corpora mailing list