[Corpora-List] Interlanguage links in Wikipedia

Alon Lischinsky alischinsky at gmail.com
Mon Jun 18 08:31:18 UTC 2012


Hi Nasrin,

> I want to access pages in the Wikipedia that have different language and
> their content are nearly equivalent or exactly equivalent. It
> seems Interlanguage links have enough information for me.

I'm afraid this is going to be a rather noisy source of data.

For various reasons, interwiki links do not necessarily link
equivalent pages; for example, a topic that has an entry of its own in
one language version may correspond to just a section in a larger
article in a different one.

Besides, the content of the linked pages is only rarely equivalent:
articles in different languages are usually written independently,
with little cross-language feedback, and their degree of development
depends on the widely-varying interests of different language
communities. To take an almost trivial example, it's unlikely that the
Arabic-language and Hebrew-language versions of the ‘Intifada’ entry
are going to be similar in any interesting sense.

It might be important to consider these issues, although that depends
on what you want these data for.

Cheers,

Alon

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list