[Corpora-List] Wikipedia data, JWPL - A Java-based Wikipedia API released
sai deepak
tsaideepak at gmail.com
Mon Apr 27 06:40:02 UTC 2009
Sir,
I T.Sai Deepak doing my M.Tech from IIT Roorkee. I am presently working on
"Paraphrase Detection".
For my work I need to access wikipedia. I found your API as very much
useful, but I am not able to download Wikipedia data since it is an FTP
connection which requires authentication.
Is there any other possible way to download this data??
As mentioned in the Jwpl software document, I have downloaded the wikipedia
data form http://download.wikimedia.org/backup-index.html
The three archives which i have downloaded are:
* [LANGCODE]wiki-[DATE]-pages-articles.xml.bz2
* [LANGCODE]wiki-[DATE]-pagelinks.sql.gz
* [LANGCODE]wiki-[DATE]-categorylinks.sql.gz
But for most of the pages I am getting an error that "Page not available"
even though the page is available in Wikipedia. Can you please suggest me a
solution for this problem.
Thanks
Regards
T. Sai Deepak
M.Tech CSE
IIT Roorkee.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090427/e75db8e1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list