[Corpora-List] Weblogs Corpus + 2nd CFP for the Int. Conference on Weblogs and Social Media (ICWSM)

Nicolas Nicolov Nicolas at umbrialistens.com
Fri Sep 15 16:56:09 UTC 2006


=============================================
Int. Conference on Weblogs and Social Media
March 26-28, 2007
Boulder, Colorado, U.S.A.
www.icwsm.org
=============================================

Availability of Data

Continuing the tradition from the WWE'06 
workshop, we are once again offering a large 
blog dataset to conference participants. The 
data release comprises a complete set of 
weblog posts collected by Nielsen BuzzMetrics 
for May 2006 (consisting of about 14M posts 
from 3M weblogs). The data set includes the 
full content of the posts plus mark-up and 
represents an unprecedented collection for 
blog researchers. Our hope is that a communal
dataset, approached from many different 
directions, will yield many interesting 
results. More information on the dataset, 
which is available for immediate download,  
can be found at: 
http://www.icwsm.org/data.html 

Call for Papers

Recent years have seen a flourishing of social 
media - the promise of the WWW coming to fruition. 
Across the world, individuals can share opinions, 
experiences and expertise at the push of a button. 
There has been a fundamental shift thanks to 
significant advances in the ease of publishing 
content. Creating web content was for years the 
domain of tech-savvy people; now the barrier has 
been torn down.

Perhaps the most visible among the successes of 
social media in recent years is the blogosphere. 
Tens of thousands of new blogs are created every day;
blog content is becoming ubiquitous, surfacing 
in news portals, search results and corporate 
public relations. Even those who are unaware of the 
blogosphere are still influenced by its content. 
Although blogs are highly visible currently, other 
forms of conversational spaces continue to flourish, 
especially message boards, mailing lists, review 
sites and Usenet.

Social media covers all forms of sharing: from 
photos, to videos, to recommendations. In the past 
few years, many examples of social media have 
become hugely successful. Flickr is a premier photo 
sharing site; del.icio.us has become a touchstone 
for sharing recommendations of websites; Web 2.0 
applications in general abound with newcomers in 
the social media space.

One of the fascinating aspects of social media 
has been the drive from within to study the 
ecology as it evolves. People act at once as 
creators, observers and influencers of the space 
in which they participate. At the same time, 
businesses are quickly grasping the potential 
benefit to attending to the new space of social 
media. Monitoring the aggregate trends and 
opinions revealed by social media provides 
valuable insight to a number of business 
applications: marketing intelligence, competitive 
intelligence.

The fast growing blogosphere and social media space 
is a fruitful area for investigations across many 
disciplines. For example:

  * Natural language processing and machine learning 
    researchers study the extraction of factual 
    information from text; can blogs be processed in 
    a robust manner and can knowledge bases be 
    populated with facts from blogs?
  * Social network researchers and graph theory 
    researchers are concerned with inferring 
    community structure; analyzing the linkage 
    patterns among blog entries can provide explicit 
    community structure; can we infer implicit 
    communities through the content of the blogs?
  * Political scientists are looking at ways of 
    identifying influencers in a community; who are 
    the influential bloggers whose voice is echoed 
    by others?
  * Multimedia researchers are attempting to 
    categorize audio and video content, aggregate 
    information from diverse sources (textual, audio, 
    video); can visual & audio social media be stored 
    in a way that allows search across different 
    modalities?
  * Market analysis researchers are concerned with 
    what people think of the products and services 
    of a company; can we process blogs automatically 
    and find consumer complaints and breaking reports 
    about vulnerabilities of products; also when does 
    a burst of blogging activity become a trend?
  * Social psychologists study the response to 
    current events, including emotional and 
    attitudinal dimensions as well as content and 
    patterns of influence.

Despite the growing relevance of blogs and social 
media, existing research has only begun to address 
the spectrum of issues that arise in their analysis. 
Blogs, for example, are a different kind of document 
than the relatively clean text that NLP research is 
based on. Such differences in term of structure, 
content and grammaticality will be a challenge 
considering that blogs will likely represent the most 
common way of publicly accessible personal expression.


AREAS OF INTEREST

The conference aims to bring together researchers 
from different subject areas (e.g., computer science,
linguistics, psychology, statistics, sociology, 
multimedia and semantic web technologies) and foster 
discussions about ongoing research in the following 
areas:

[01] AI methods for ethnographic analysis through 
     social media.
[02] Blogosphere vs. mediasphere; measuring the 
     influence of blogs on the media.
[03] Centrality/influence of bloggers/blogs; ranking/
     relevance of blogs; web pages ranking based on 
     blogs.
[04] Crawling/spidering and indexing.
[05] Human Computer Interaction; social media tools; 
     navigation.
[06] Multimedia; audio/visual processing; aggregating 
     information from different modalities.
[07] Semantic analysis; cross-system and cross-media 
     name tracking; named relations and fact 
     extraction; discourse analysis; summarization.
[08] Semantic Web; unstructured knowledge management.
[09] Sentiment analysis; polarity/opinion 
     identification and extraction.
[10] Social Network Analysis; communities 
     identification; expertise discovery; 
     collaborative filtering.
[11] Text categorization; gender/age identification; 
     spam filtering.
[12] Time Series Forecasting; measuring 
     predictability of phenomena based on social 
     media.
[13] Trend identification/tracking.
[14] Visualization, aggregation and filtering.
[15] New social media applications, interfaces, 
     interaction techniques

IMPORTANT DATES

Submissions:  December 8, 2006
Acceptance Notifications:  February 2, 2007
Camera ready copies:  February 16, 2007
Tutorials:  March 25, 2007
Conference:  March 26-28, 2007


SUBMISSION

People interested in participating should submit 
through the conference website a technical paper 
(up to 8 pages), a short paper (up to 4 pages), 
a poster or demo description (up to 2 pages) 
by midnight (PST) of Dec 8, 2006. Each submission 
should, to the extent possible, indicate a list of 
relevant areas from the list above (e.g., 03, 04, 10).


CHAIRS

  * Natalie Glance, Nielsen BuzzMetrics.
  * Nicolas Nicolov, Umbria Inc.


CO-CHAIRS

  * Eytan Adar, Univ. of Washington.
  * Matthew Hurst, Nielsen BuzzMetrics.
  * Mark Liberman, Univ. of Pennsylvania.
  * Franco Salvetti, Univ. of Colorado at Boulder &
    Umbria Inc.

LOCAL CHAIR

  * James H. Martin, Univ. of Colorado at Boulder.

PROGRAM COMMITTEE

  * Paolo Avesani, ITC-irst, Italy
  * Bran Boguraev, IBM Research, USA
  * Chris Brooks, Univ. of San Francisco, USA
  * Claire Cardie, Cornell Univ., USA
  * Scott Carter, UC Berkeley, USA
  * Steve Cayzer, HP Labs Bristol, UK
  * Thierry Declerck, DFKI Language Lab, Germany
  * Donghui Feng, ISI, USC, USA
  * Tim Finin, UMBC, USA 
  * Kathy Gill, Univ. of Washington, USA
  * Michelle Gumbrecht, Stanford Univ., USA
  * John Henderson, MITRE, USA
  * Eduard Hovy, ISI, USC, USA
  * Jussi Karlgren, SICS, Sweden
  * Laura Knudsen, OSC, USA
  * Moshe Koppel, Bar-Ilan Univ., Israel
  * Cameron Marlow, Yahoo! Research, USA
  * Lluis Marquez, Univ. Poli. de Catalunya, Spain
  * Rada Mihalcea, Univ. of North Texas, USA
  * Gilad Mishne, Univ. of Amsterdam, The Netherlands
  * Tomoyuki Nanno, Google, Japan
  * Apostol Natsev, IBM Research, USA
  * Kamal Nigam, Google, USA
  * Peter Norvig, Google, USA
  * Jon Oberlander, Univ. of Edinburgh, Scotland
  * Peter Pirolli, PARC, USA
  * Oana Postolache, Univ. of Saarland, Germany
  * John Prager, IBM Research, USA
  * Alessandro Provetti, Univ. of Messina, Italy
  * Drago Radev, Univ. of Michigan, USA
  * Jonathon Read, Univ. of Sussex, UK
  * Maarten de Rijke, Univ. of Amsterdam
  * Laura Ripamonti, Univ. of Milan, Italy
  * Irina Rish, IBM Watson Research Center, USA
  * Dan Roth, Univ. of Illinois at Urbana-Champaign
  * James G. Shanahan, Turn Inc., USA
  * Emma Shen, OSC, USA
  * Suresh Sood, Univ. of Tech. Sydney, Australia
  * Savitha Srinivasan, IBM Research, USA
  * Carlo Strapparava, ITC-irst, Italy
  * V.S. Subrahmanian, Univ. of Maryland, USA
  * Belle Tseng, NEC Labs America, USA
  * Janyce M. Wiebe, Univ. of Pittsburgh, USA
  * Tong Zhang, Yahoo! Research, USA
  * Liang Zhou, ISI, USC, USA
  * Ethan Zuckerman, Harvard Univ., USA


VENUE

The conference will take place at Marriott Boulder
(http://marriott.com/property/propertypage/DENBO) 
located near downtown Boulder, Colorado.

SPONSORS

ICWSM is proud to be supported by:

  * Google, Inc.
  * Microsoft Live Labs
  * NEC Labs America
  * Sphere

and

  * Nielsen BuzzMetrics.
  * Umbria, Inc.
  * University of Pennsylvania
  * University of Maryland, Baltimore County

ICWSM is a IW3C2 endorsed conference 
(http://www.iw3c2.org/).


HISTORY

The International Conference on Weblogs and social 
media grew out of two events: the annual series of 
Workshops on the Weblogging Ecosystem (WWE 2006, 
WWE 2005, WWE 2004) held in conjunction with the 
International World Wide Web Conference and the 
Spring Symposium organized by the American 
Association for Artificial Intelligence (AAAI) 
on Computational Approaches to Analyzing Weblogs 
(CAAW 2006).


CONTACT

  info (at) icwsm dot org




Best wishes
Nicolas
---
Dr Nicolas Nicolov
Chief Scientist
Umbria Inc.
1655 Walnut St, Suite 300
Boulder, CO 80302, U.S.A.
Tel: (310) 754-5010



More information about the Corpora mailing list