[Corpora-List] CFP: Click Data Available Under License as part of Workshop on Web Search Click Data (WSCD 2009), Barcelona, Spain, Feb 9th

Rosie Jones jonesr at yahoo-inc.com
Wed Aug 13 10:01:23 UTC 2008


WSCD09: Workshop on Web Search Click Data 2009
http://research.microsoft.com/users/nickcr/wscd09/

 

Held in conjunction with WSDM 2009

http://www.wsdm2008.org/

 

February 9, 2009

Barcelona, Spain

 

Organizers

 

    * Nick Craswell, Microsoft

    * Rosie Jones, Yahoo! Labs

    * Georges Dupret, Yahoo! Labs

    * Evelyne Viegas, Microsoft

 

Workshop Overview

 

Research relating to search logs has been hampered by the limited
availability of click datasets. This workshop is a forum for new
research relating to Web search usage logs. It has an associated
dataset, the Microsoft 2006 RFP dataset, which will be made available to
participants (for free, but under license). Besides using this dataset,
the workshop may also serve as a forum for other new developments in the
area, and for discussing desirable properties of future search log
datasets.

 

Topics of interest include but are not restricted to:

 

    * web mining

    * information retrieval

    * learning to rank

    * desiderata for future click data releases

    * mining semantic relationships, for example within and between the
query set and document set

    * analysis and correction of biases in the data

    * clustering/grouping log data by: topic, task, geographic location,
time.

    * generative models for the log events, query text and/or document
text

    * other tasks which can be improved with the click data

 

The Dataset

 

MSN Search query Log excerpt

 

    * 15 million queries

    * Sampled over one month

    * Queries from the US site (mostly English)

 

Per query attributes included:

 

   1. Session ID

   2. Time-stamp

   3. Query string

   4. Number of results on results page

   5. Results page number

 

Data per query for each result clicked:

 

   1. URL

   2. Associated query

   3. Position on results page

   4. Time-stamp

 

Due to the type of assets under consideration, the principal
investigator will be asked to sign a data licensing agreement before
accessing the data. The terms of the license will allow for publication
of results but restricts redistribution of the data and publication of
detailed excerpts of the data.

 

Other click datasets may also be used, but it is desirable to show your
findings on the shared dataset where possible.

 

Maximum Number of Participants: 40

 

Activities: Presentations & Posters sessions.

 

Proposals

 

To access the data, write a one page abstract of your proposed
experiments using the data. We will check the proposals, collect the
necessary paperwork then deliver the data on CD.

 

Submission details to follow on workshop website:

http://research.microsoft.com/users/nickcr/wscd09/

 

Important Dates

 

    * Proposals: Wednesday, September 3, 2008

    * Response to proposals: Wednesday, September 10, 2008

    * Paper submission: Friday, December 5, 2008

    * Paper notification: Friday, January 2, 2008

    * Camera ready: January 12, 2009

    * Workshop: February 9, 2009

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080813/2140c5fe/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list