[Corpora-List] blog dataset availability
Paula Chesley
ches0045 at umn.edu
Sat Apr 4 01:34:29 UTC 2009
Hi corpora list members,
I'm looking for a pretty big blog dataset that is marked up for the
following attributes:
writer ID
blog ID
reader IDs (who will be writers of other blogs/entries)
time of publication
whether/how often blog ID is referenced by other blogs (as in network
information)
The ICWSM 2009 dataset is *almost* what I'm looking for, but not
quite: it doesn't have specific trackback information, like what
specific blogs, in terms of URLs, link to a given blog or a given
post on the blog. This info. is necessary for me to see how a
linguistic variable spreads in the blogosphere.
If you know about such a dataset, I'd appreciate any information you
might have!
Thanks,
Paula
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090403/29d7e073/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list