[Corpora-List] Interlanguage links in Wikipedia

Fang Xu fangxu at lsv.uni-saarland.de
Sun Jun 17 20:04:47 UTC 2012


Hi Nasrin,
You can get the snapshot of Wikipedia in different languages at the
following sites
http://dumps.wikimedia.org/backup-index.html

More description about dump processing goes to
http://en.wikipedia.org/wiki/Wikipedia:Database_download

There're several parser for extracting elements from the xml dump file.
e.g.
medialab.di.unipi.it/wiki/Wikipedia_Extractor
wikipedia-miner.sourceforge.net

The Wikimedia dump site also provide sql files from inter language links
between languages,
Never use that. Simply try them if you want.

 Happy Hacking!!




On Fri, Jun 15, 2012 at 8:49 AM, Nasrin Baratali
<nasrin.baratali at gmail.com>wrote:

> To whom it may concern,
>
> I want to access pages in the Wikipedia that have different language and
> their content are nearly equivalent or exactly equivalent. It
> seems Interlanguage links have enough information for me. However  I do not
> know how I could extract these links or equivalent pages. I would be
> appreciate if any one could help me.
>
> Regards,
>
> Nasrin Baratalipour
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
XU Fang  徐昉
Spoken Language Systems
Saarland University
66041 Saarbrücken
Tel. ++49 681 302 58128
Fax ++49 681 302 58123
Fang.Xu at LSV.Uni-Saarland.De <fang.xu at LSV.Uni-Saarland.De>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120617/144eff9b/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list