[Corpora-List] Corpus from Blogs required.
Trilok Khairnar
trilokgk at gmail.com
Wed Mar 30 11:51:21 UTC 2005
Hello,
Is corpus extracted from a variety of blogs available online (for
academic use)?
I would like to tag texts in such corpus and perform stylistic analysis on it.
Alternatively, is there an API for this blog post text extraction task ?
The XML-RPC API for Waypath (http://www.waypath.com/apis/) looks good,
but seems that it doesn't return full text of posts and documentation
avail. is not very detailed.
In the absence of such corpus and APIs, I am thinking of doing this by
1] using RSS, ATOM feed parsers on some OPML files to get URLs for blog posts
2] Extracting the text (easier if the blog template format is known)
Thanks and Regards,
Trilok.
More information about the Corpora
mailing list