[Corpora-List] Wikipedia data, JWPL - A Java-based Wikipedia API released

sai deepak tsaideepak at gmail.com
Mon Apr 27 06:40:02 UTC 2009


Sir,

I T.Sai Deepak doing my M.Tech from IIT Roorkee. I am presently working on
"Paraphrase Detection".

For my work I need to access wikipedia. I found your API as very much
useful, but I am not able to download Wikipedia data since it is an FTP
connection which requires authentication.

Is there any other possible way to download this data??

As mentioned in the Jwpl software document, I have downloaded the wikipedia
data form  http://download.wikimedia.org/backup-index.html

The three archives which i have downloaded are:

   * [LANGCODE]wiki-[DATE]-pages-articles.xml.bz2
   * [LANGCODE]wiki-[DATE]-pagelinks.sql.gz
   * [LANGCODE]wiki-[DATE]-categorylinks.sql.gz


But for most of the pages I am getting an error that "Page not available"
even though the page is available in Wikipedia. Can you please suggest me a
solution for this problem.

Thanks

Regards
T. Sai Deepak
M.Tech CSE
IIT Roorkee.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090427/e75db8e1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list