[Corpora-List] blog dataset availability

Paula Chesley ches0045 at umn.edu
Sat Apr 4 01:34:29 UTC 2009


Hi corpora list members,

I'm looking for a pretty big blog dataset that is marked up for the  
following attributes:
writer ID
blog ID
reader IDs (who will be writers of other blogs/entries)
time of publication
whether/how often blog ID is referenced by other blogs (as in network  
information)
The ICWSM 2009 dataset is *almost* what I'm looking for, but not  
quite: it doesn't have specific trackback information, like what  
specific blogs, in terms of URLs, link to a given blog or a given  
post on the blog. This info. is necessary for me to see how a  
linguistic variable spreads in the blogosphere.

If you know about such a dataset, I'd appreciate any information you  
might have!

Thanks,
Paula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090403/29d7e073/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list